UNPKG

epubjs

Version:

Render ePub documents in the browser, across many devices

325 lines (320 loc) 52.9 kB
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head><title>Validating Documents</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="sect1" title="Validating Documents"><div class="titlepage"><div><div><h1 class="title"><a id="learnjava3-CHP-24-SECT-7"/>Validating Documents</h1></div></div></div><div class="epigraph"><p>Words, words, mere words, no matter from the heart.</p><div class="attribution"><span>—<span class="attribution">William Shakespeare, <span class="emphasis"><em>Troilus and Cressida</em></span></span></span></div></div><p>In this section, we talk about DTDs and XML Schema, two ways to enforce rules in an XML document. A DTD is a simple grammar guide for an XML document, defining which tags may appear where, in what order, with what attributes, etc. XML Schema is the next generation of DTD. With XML Schema, you can describe the data content of the document as well as the structure. XML Schemas are written in terms of primitives, such as numbers, dates, and simple regular expressions, and also allow the user to define complex types in a grammar-like fashion. The word <span class="emphasis"><em>schema</em></span> means a blueprint or plan for structure, so we’ll refer to DTDs and XML Schema collectively as schema where either applies.</p><p>DTDs, although much more limited in capability, are still widely used. This may be partly due to the complexity involved in writing XML Schemas by hand. The W3C XML Schema standard is verbose and cumbersome, which may explain why several alternative syntaxes have sprung up. The <a id="I_indexterm24_id833055" class="indexterm"/><code class="literal">javax.xml.validation</code> API performs XML validation in a pluggable way. Out of the box, it supports only W3C XML Schema, but new schema languages can be added in the future. Validating with a DTD is supported as an older feature directly in the SAX parser. We’ll use both in this section.</p><div class="sect2" title="Using Document Validation"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-7.1"/>Using Document Validation</h2></div></div></div><p>XML’s validation of documents is a key piece of what makes it useful as a data format. Using a schema is somewhat analogous to the way Java classes enforce type checking in the language. A schema defines document types. Documents conforming to a given schema are often referred to as <span class="emphasis"><em>instance documents</em></span> of the schema.</p><p>This type safety provides a layer of protection that eliminates having to write complex error-checking code. However, validation may not be necessary in every environment. For example, when the same tool generates XML and reads it back in a short time span, validation may not be necessary. It is invaluable, though, during development. Sometimes document validation is used during development and turned off in production environments.</p></div><div class="sect2" title="DTDs"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-7.2"/>DTDs</h2></div></div></div><p><a id="idx11223" class="indexterm"/>The DTD language is fairly simple. A DTD is primarily a set of special tags that define each element in the document and, for complex types, provide a list of the elements it may contain. The DTD <code class="literal">&lt;!ELEMENT&gt;</code> tag consists of the name of the tag and either a special keyword for the data type or a parenthesized list of elements.</p><a id="I_24_tt1330"/><pre class="programlisting"><code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">Name</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">Document</code> <code class="o">(</code> <code class="n">Head</code><code class="o">,</code> <code class="n">Body</code> <code class="o">)&gt;</code></pre><p>The special identifier <a id="I_indexterm24_id833137" class="indexterm"/><a id="I_indexterm24_id833142" class="indexterm"/><a id="I_indexterm24_id833150" class="indexterm"/><code class="literal">#PCDATA</code> (parsed character data) indicates a string. When a list is provided, the elements are expected to appear in that order. The list may contain sublists, and items may be made optional using a vertical bar (<code class="literal">|</code>) as an OR operator. Special notation can also be used to indicate how many of each item may appear; two examples of this notation are shown in <a class="xref" href="ch24s08.html#learnjava3-CHP-24-TABLE-4" title="Table 24-4. DTD notation defining occurrences">Table 24-4</a>.</p><div class="table"><a id="learnjava3-CHP-24-TABLE-4"/><p class="title">Table 24-4. DTD notation defining occurrences</p><div class="table-contents"><table summary="DTD notation defining occurrences" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col/><col/></colgroup><thead><tr><th style="text-align: left"><p>Character</p></th><th style="text-align: left"><p>Meaning</p></th></tr></thead><tbody><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833219" class="indexterm"/> <a id="I_indexterm24_id833227" class="indexterm"/>*</p></td><td style="text-align: left"><p>Zero or more occurrences</p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833250" class="indexterm"/> <a id="I_indexterm24_id833259" class="indexterm"/>?</p></td><td style="text-align: left"><p>Zero or one occurrences</p></td></tr></tbody></table></div></div><p>Attributes of an element are defined with the <a id="I_indexterm24_id833278" class="indexterm"/><code class="literal">&lt;!ATTLIST&gt;</code> tag. This tag enables the DTD to enforce rules about attributes. It accepts a list of identifiers and a default value:</p><a id="I_24_tt1331"/><pre class="programlisting"><code class="o">&lt;!</code><code class="n">ATTLIST</code> <code class="n">Animal</code> <code class="n">animalClass</code> <code class="o">(</code><code class="n">unknown</code> <code class="o">|</code> <code class="n">mammal</code> <code class="o">|</code> <code class="n">reptile</code><code class="o">)</code> <code class="s">"unknown"</code><code class="o">&gt;</code></pre><p>This <code class="literal">ATTLIST</code> says that the <code class="literal">animal</code> element has an <code class="literal">animalClass</code> attribute that can have one of several values (e.g.: <code class="literal">unknown</code>, <code class="literal">mammal</code>, <code class="literal">reptile</code>). The default is <code class="literal">unknown</code>.</p><p>We won’t cover everything you can do with DTDs here. But the following example will guarantee <span class="emphasis"><em>zooinventory.xml</em></span> follows the format we’ve described. Place the following in a file called <span class="emphasis"><em>zooinventory.dtd</em></span> (or grab this file from <a class="ulink" href="http://oreil.ly/Java_4E">http://oreil.ly/Java_4E</a>):</p><a id="I_24_tt1332"/><pre class="programlisting"><code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">inventory</code> <code class="o">(</code> <code class="n">animal</code><code class="o">*</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">animal</code> <code class="o">(</code> <code class="n">name</code><code class="o">,</code> <code class="n">species</code><code class="o">,</code> <code class="n">habitat</code><code class="o">,</code> <code class="o">(</code><code class="n">food</code> <code class="o">|</code> <code class="n">foodRecipe</code><code class="o">),</code> <code class="n">temperament</code><code class="o">,</code> <code class="n">weight</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ATTLIST</code> <code class="n">animal</code> <code class="n">animalClass</code> <code class="o">(</code> <code class="n">unknown</code> <code class="o">|</code> <code class="n">mammal</code> <code class="o">|</code> <code class="n">reptile</code> <code class="o">|</code> <code class="n">bird</code> <code class="o">|</code> <code class="n">fish</code> <code class="o">)</code> <code class="s">"unknown"</code><code class="o">&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">name</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">species</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">habitat</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">food</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">weight</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">foodRecipe</code> <code class="o">(</code> <code class="n">name</code><code class="o">,</code> <code class="n">ingredient</code><code class="o">+</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">ingredient</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code> <code class="o">&lt;!</code><code class="n">ELEMENT</code> <code class="n">temperament</code> <code class="o">(</code> <code class="err">#</code><code class="n">PCDATA</code> <code class="o">)&gt;</code></pre><p>The DTD says that an <code class="literal">inventory</code> consists of any number of <code class="literal">animal</code> elements. An <code class="literal">animal</code> has a <code class="literal">name</code>, <code class="literal">species</code>, and <code class="literal">habitat</code> tag followed by either a <code class="literal">food</code> or <code class="literal">foodRecipe</code>. <code class="literal">foodRecipe</code>’s structure is further defined later.</p><p>To use a DTD, we associate it with the XML document. We can do this by placing a <a id="I_indexterm24_id833428" class="indexterm"/><code class="literal">DOCTYPE</code> declaration in the XML document itself and allow the XML parser to recognize and enforce it. The Java validation API that we’ll talk about in the next section separates the roles of parsing and validation and can be used to validate arbitrary XML against any kind of schema, including DTDs. The problem is that out of the box, the validation API only implements the (newer) XML schema syntax. So we’ll have to rely on the parser to validate the DTD for us here.</p><p>In this case, when a validating parser encounters the <code class="literal">DOCTYPE</code>, it attempts to load the DTD and validate the document. There are several forms the <code class="literal">DOCTYPE</code> can have, but the one we’ll use is:</p><a id="I_24_tt1333"/><pre class="programlisting"><code class="o">&lt;!</code><code class="n">DOCTYPE</code> <code class="n">Inventory</code> <code class="n">SYSTEM</code> <code class="s">"zooinventory.dtd"</code><code class="o">&gt;</code></pre><p>Both SAX and DOM parsers can automatically validate documents as they read them, provided that the documents contain a <code class="literal">DOCTYPE</code> declaration. However, you have to explicitly ask the parser factory to provide a parser that is capable of validation. To do this, just set the validating property of the parser factory to <code class="literal">true</code> before you ask it for an instance of the parser. For example:</p><a id="I_24_tt1334"/><pre class="programlisting"><code class="o">...</code> <code class="n">SAXParserFactory</code> <code class="n">factory</code> <code class="o">=</code> <code class="n">SAXParserFactory</code><code class="o">.</code><code class="na">newInstance</code><code class="o">();</code> <code class="n">factory</code><code class="o">.</code><code class="na">setValidating</code><code class="o">(</code> <code class="kc">true</code> <code class="o">);</code></pre><p>Again, this <a id="I_indexterm24_id833498" class="indexterm"/><code class="literal">setValidating()</code> method is an older, more simplistic way to enable validation of documents that contain DTD references and it is tied to the parser. The new validation package that we’ll discuss later is independent of the parser and more flexible. You should not use the parser-validating method in combination with the new validation API unless you want to validate documents twice for some reason.</p><p>Try inserting the <code class="literal">setValidating()</code> line in our model builder example after the factory is created. Abuse the <span class="emphasis"><em>zooinventory.xml</em></span> file by adding or removing an element or attribute and then see what happens when you run the example. You should get useful error messages from the parser indicating the problems and parsing should fail. To get more information about the validation, we can register an <code class="literal">org.xml.sax.ErrorHandler</code> object with the parser, but by default, Java installs one that simply prints the errors for us.<a id="I_indexterm24_id833537" class="indexterm"/></p></div><div class="sect2" title="XML Schema"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-7.3"/>XML Schema</h2></div></div></div><p><a id="idx11225" class="indexterm"/>Although DTDs can define the basic structure of an XML document, they don’t provide a very rich vocabulary for describing the relationships between elements and say very little about their content. For example, there is no reasonable way with DTDs to specify that an element is to contain a numeric type or even to govern the length of string data. The XML Schema standard addresses both the structural and data content of an XML document. It is the next logical step and it (or one of the competing schema languages with similar capabilities) should replace DTDs in the future.</p><p>XML Schema brings the equivalent of strong typing to XML by drawing on many predefined primitive element types and allowing users to define new complex types of their own. These schemas even allow for types to be extended and used polymorphically, like types in the Java language. Although we can’t cover XML Schema in any detail, we’ll present the equivalent W3C XML Schema for our <span class="emphasis"><em>zooinventory.xml</em></span> file here:</p><a id="I_24_tt1335"/><pre class="programlisting"><code class="o">&lt;?</code><code class="n">xml</code> <code class="n">version</code><code class="o">=</code><code class="s">"1.0"</code> <code class="n">encoding</code><code class="o">=</code><code class="s">"UTF-8"</code><code class="o">?&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">schema</code> <code class="nl">xmlns:</code><code class="n">xs</code><code class="o">=</code><code class="s">"http://www.w3.org/2001/XMLSchema"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"inventory"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">maxOccurs</code><code class="o">=</code><code class="s">"unbounded"</code> <code class="n">ref</code><code class="o">=</code><code class="s">"animal"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">element</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"name"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"animal"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">ref</code><code class="o">=</code><code class="s">"name"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"species"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"habitat"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">choice</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"food"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">ref</code><code class="o">=</code><code class="s">"foodRecipe"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">choice</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"temperament"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"weight"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:double"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">attribute</code> <code class="n">name</code><code class="o">=</code><code class="s">"animalClass"</code> <code class="k">default</code><code class="o">=</code><code class="s">"unknown"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">simpleType</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">restriction</code> <code class="n">base</code><code class="o">=</code><code class="s">"xs:token"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">enumeration</code> <code class="n">value</code><code class="o">=</code><code class="s">"unknown"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">enumeration</code> <code class="n">value</code><code class="o">=</code><code class="s">"mammal"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">enumeration</code> <code class="n">value</code><code class="o">=</code><code class="s">"reptile"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">enumeration</code> <code class="n">value</code><code class="o">=</code><code class="s">"bird"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">restriction</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">simpleType</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">attribute</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">element</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"foodRecipe"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">ref</code><code class="o">=</code><code class="s">"name"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">maxOccurs</code><code class="o">=</code><code class="s">"unbounded"</code> <code class="n">name</code><code class="o">=</code><code class="s">"ingredient"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">element</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">schema</code><code class="o">&gt;</code></pre><p>This schema would normally be placed into an XML Schema Definition file, which has a <a id="I_indexterm24_id833614" class="indexterm"/><span class="emphasis"><em>.xsd</em></span> extension. The first thing to note is that this schema file is a normal, well-formed XML file that uses elements from the W3C XML Schema namespace. In it, we use nested <code class="literal">element</code> declarations to define the elements that will appear in our document. As with most languages, there is more than one way to accomplish this task. Here, we have broken out the “complex” <code class="literal">animal</code> and <code class="literal">foodRecipe</code> elements into their own separate element declarations and referred to them in their parent elements using the <a id="I_indexterm24_id833647" class="indexterm"/><code class="literal">ref</code> attribute. In this case, we did it mainly for readability; it would have been legal to have one big, deeply nested element declaration starting at <code class="literal">inventory</code>. However, referring to elements by reference in this way also allows us to reuse the same element declaration in multiple places in the document, if needed. Our <code class="literal">name</code> element is a small example of this. Although it didn’t do much for us here, we have broken out the <code class="literal">name</code> element and referred to it for both the <code class="literal">Animal</code>/<code class="literal">Name</code> and the <code class="literal">FoodRecipe</code>/<code class="literal">Name</code>. Breaking out <code class="literal">name</code> like this would allow us to use more advanced features of schema and write rules for what a <code class="literal">name</code> can be (e.g., how long, what kind of characters are allowed) in one place and reuse that “type” where needed.</p><p>Control directives like <code class="literal">sequence</code> and <code class="literal">choice</code> allow us to define the structure of the child elements allowed and attributes like <code class="literal">minOccurs</code> and <code class="literal">maxOccurs</code> let us specify cardinality (how many instances). The <code class="literal">sequence</code> directive says that the enclosed elements should appear in the specified order (if they are required). The <code class="literal">choice</code> directive allows us to specify alternative child elements like <code class="literal">food</code> or <code class="literal">foodRecipe</code>. We declared the legal values for our <code class="literal">animalClass</code> attribute using a <code class="literal">restriction</code> declaration and <code class="literal">enumeration</code> tags.</p><div class="sect3" title="Simple types"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-7.3.1"/>Simple types</h3></div></div></div><p>Although we’ve not really exercised it here, the <a id="I_indexterm24_id833786" class="indexterm"/><code class="literal">type</code> attribute of our elements touches on the standardization of types in XML Schema. All of our “text” elements specify a type <code class="literal">xs:string</code>, which is a standard XML Schema string type (kind of equivalent to PCDATA in our DTD). There are many other standard types covering things such as dates, times, periods, numbers, and even URLs. These are called <span class="emphasis"><em>simple types</em></span> (though some of them are not so simple) because they are standardized or “built-in.” <a class="xref" href="ch24s08.html#learnjava3-CHP-24-TABLE-5" title="Table 24-5. W3C Schema simple types">Table 24-5</a> lists W3C Schema simple types and their corresponding Java types. The correspondence will become useful later when we talk about JAXB and automated binding of XML to Java classes.</p><div class="table"><a id="learnjava3-CHP-24-TABLE-5"/><p class="title">Table 24-5. W3C Schema simple types</p><div class="table-contents"><table summary="W3C Schema simple types" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col/><col/><col/></colgroup><thead><tr><th style="text-align: left"><p>Schema element type</p></th><th style="text-align: left"><p>Java type</p></th><th style="text-align: left"><p>Example</p></th></tr></thead><tbody><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833873" class="indexterm"/> <code class="literal">xsd:string</code> </p></td><td style="text-align: left"><p> <code class="literal">java.lang.String</code> </p></td><td style="text-align: left"><p> <code class="literal">"This is text"</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833912" class="indexterm"/> <code class="literal">xsd:boolean</code> </p></td><td style="text-align: left"><p> <code class="literal">boolean</code> </p></td><td style="text-align: left"><p> <code class="literal">true</code>, <code class="literal">false</code>, <code class="literal">1</code>, <code class="literal">0</code></p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833966" class="indexterm"/> <code class="literal">xsd:byte</code> </p></td><td style="text-align: left"><p> <code class="literal">byte</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id833999" class="indexterm"/> <code class="literal">xsd:unsignedByte</code> </p></td><td style="text-align: left"><p> <code class="literal">short</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834033" class="indexterm"/> <code class="literal">xsd:integer</code> </p></td><td style="text-align: left"><p> <code class="literal">java.math.BigInteger</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834067" class="indexterm"/> <code class="literal">xsd:int</code> </p></td><td style="text-align: left"><p> <code class="literal">int</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834100" class="indexterm"/> <code class="literal">xsd:unsignedInt</code> </p></td><td style="text-align: left"><p> <code class="literal">long</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834134" class="indexterm"/> <code class="literal">xsd.long</code> </p></td><td style="text-align: left"><p> <code class="literal">long</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834167" class="indexterm"/> <code class="literal">xsd:short</code> </p></td><td style="text-align: left"><p> <code class="literal">short</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834200" class="indexterm"/> <code class="literal">xsd:unsignedShort</code> </p></td><td style="text-align: left"><p> <code class="literal">int</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834234" class="indexterm"/> <code class="literal">xsd:decimal</code> </p></td><td style="text-align: left"><p> <code class="literal">java.math.BigDecimal</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834268" class="indexterm"/> <code class="literal">xsd:float</code> </p></td><td style="text-align: left"><p> <code class="literal">float</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834301" class="indexterm"/> <code class="literal">xsd:double</code> </p></td><td style="text-align: left"><p> <code class="literal">double</code> </p></td><td style="text-align: left"><p> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834335" class="indexterm"/> <code class="literal">xsd:Qname</code> </p></td><td style="text-align: left"><p> <code class="literal">javax.xml.namespace.QName</code> </p></td><td style="text-align: left"><p> <code class="literal">funeral:corpse</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834374" class="indexterm"/> <code class="literal">xsd:dateTime</code> </p></td><td style="text-align: left"><p> <code class="literal">java.util.Calendar</code> </p></td><td style="text-align: left"><p> <code class="literal">2004-12-27T15:39:05.000-06:00</code> </p></td></tr><tr><td style="text-align: left"><p> <code class="literal">xsd:base64Binary</code> </p></td><td style="text-align: left"><p> <a id="I_indexterm24_id834424" class="indexterm"/> <code class="literal">byte[]</code> </p></td><td style="text-align: left"><p> <code class="literal">PGZv</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834452" class="indexterm"/> <code class="literal">xsd:hexBinary</code> </p></td><td style="text-align: left"><p> <code class="literal">byte[]</code> </p></td><td style="text-align: left"><p> <code class="literal">FFFF</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834491" class="indexterm"/> <code class="literal">xsd:time</code> </p></td><td style="text-align: left"><p> <code class="literal">java.util.Calendar</code> </p></td><td style="text-align: left"><p> <code class="literal">15:39:05.000-06:00</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834530" class="indexterm"/> <code class="literal">xsd:date</code> </p></td><td style="text-align: left"><p> <code class="literal">java.util.Calendar</code> </p></td><td style="text-align: left"><p> <code class="literal">2004-12-27</code> </p></td></tr><tr><td style="text-align: left"><p> <a id="I_indexterm24_id834569" class="indexterm"/> <code class="literal">xsd:anySimpleType</code> </p></td><td style="text-align: left"><p> <code class="literal">java.lang.String</code> </p></td><td style="text-align: left"><p> </p></td></tr></tbody></table></div></div><p>For example, we have a floating-point <code class="literal">weight</code> element like this in our <code class="literal">animal</code>:</p><a id="I_24_tt1336"/><pre class="programlisting"><code class="o">&lt;</code><code class="n">Weight</code><code class="o">&gt;</code><code class="mf">400.5</code><code class="o">&lt;/</code><code class="n">Weight</code><code class="o">&gt;</code></pre><p>We can now validate it in our schema by inserting the following entry at the appropriate place:</p><a id="I_24_tt1337"/><pre class="programlisting"><code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"weight"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:double"</code><code class="o">/&gt;</code></pre><p>In addition to enforcing that the content of elements matches these simple types, XML Schema can give us much more control over the text and values of elements in our document using simple rules and patterns analogous to regular expressions.</p></div><div class="sect3" title="Complex types"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-7.3.2"/>Complex types</h3></div></div></div><p>In addition to the predefined simple types listed in <a class="xref" href="ch24s08.html#learnjava3-CHP-24-TABLE-5" title="Table 24-5. W3C Schema simple types">Table 24-5</a>, we can define our own, <span class="emphasis"><em>complex types</em></span> in our schema. Complex types are element types that have internal structure and possibly child elements. Our <code class="literal">inventory</code>, <code class="literal">animal</code>, and <code class="literal">foodRecipe</code> elements are all complex types and their content must be declared with the <a id="I_indexterm24_id834669" class="indexterm"/><code class="literal">complexType</code> tag in our schema. Complex type definitions can be reused, similar to the way that element definitions can be reused in our schema; that is, we can break out a complex type definition and give it a name. We can then refer to that type by name in the <a id="I_indexterm24_id834682" class="indexterm"/><code class="literal">type</code> attributes of other elements. Because all of our complex types were only used once in their corresponding elements, we didn’t give them names. They were considered <span class="emphasis"><em>anonymous type definitions</em></span>, declared and used in the same spot. For example, we could have separated our <code class="literal">animal</code>’s type from its element declaration, like so:</p><a id="I_24_tt1338"/><pre class="programlisting"><code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"inventory"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"animal"</code> <code class="n">maxOccurs</code><code class="o">=</code><code class="s">"unbounded"</code> <code class="n">type</code><code class="o">=</code><code class="s">"AnimalType"</code><code class="o">/&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">complexType</code><code class="o">&gt;</code> <code class="o">&lt;/</code><code class="nl">xs:</code><code class="n">element</code><code class="o">&gt;</code> <code class="err"> </code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">complexType</code> <code class="n">name</code><code class="o">=</code><code class="s">"AnimalType"</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">sequence</code><code class="o">&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">ref</code><code class="o">=</code><code class="s">"name"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"species"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">&lt;</code><code class="nl">xs:</code><code class="n">element</code> <code class="n">name</code><code class="o">=</code><code class="s">"habitat"</code> <code class="n">type</code><code class="o">=</code><code class="s">"xs:string"</code><code class="o">/&gt;</code> <code class="o">...</code></pre><p>Declaring the <code class="literal">AnimalType</code> separately from the instance of the <code class="literal">animal</code> element declaration would allow us to have other, differently named elements with the same structure. For example, our <code class="literal">inventory</code> element may hold another element, <code class="literal">mainAttraction</code>, which is a type of <code class="literal">animal</code> with a different tag name.</p><p>There’s a lot more to say about W3C XML Schema and they can get quite a bit more complex than our simple example. However, you can do a lot with the few pieces we’ve previously shown. Some tools are available to help you get started. We’ll talk about one called Trang in a moment. For more information about XML Schema, see the <a class="ulink" href="http://www.w3.org/XML/Schema">W3C’s site</a> or <a class="ulink" href="http://shop.oreilly.com/product/9780596002527.do"><span class="emphasis"><em>XML Schema</em></span></a> by Eric van der Vlist (O’Reilly). In the next section, we’ll show how to validate a file or DOM model against the XML Schema we’ve just created, using the new validation API.</p></div><div class="sect3" title="Generating Schema from XML samples"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-7.3.3"/>Generating Schema from XML samples</h3></div></div></div><p>Many tools can help you write XML Schema. One helpful tool is called <a class="ulink" href="http://bit.ly/10PPbTO">Trang</a>. It is part of an alternative schema language project called RELAX NG (which we mention later in this chapter), but Trang is very useful in and of itself. It is an open source tool that can not only convert between DTDs and XML Schema, but also create a rough DTD or XML Schema by reading an “example” XML document. This is a great way to sketch out a basic, starting schema for your documents.<a id="I_indexterm24_id834797" class="indexterm"/></p></div></div><div class="sect2" title="The Validation API"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-7.4"/>The Validation API</h2></div></div></div><p><a id="idx11224" class="indexterm"/>To use our example’s XML schema, we need to exercise the new <a id="I_indexterm24_id834828" class="indexterm"/><code class="literal">javax.xml.validation</code> API. As we said earlier, the validation API is an alternative to the simple, parser-based validation supported through the <a id="I_indexterm24_id834840" class="indexterm"/><code class="literal">setValidating()</code> method of the parser factories. To use the validation package, we create an instance of a <code class="literal">SchemaFactory</code>, specifying the schema language. We can then validate a DOM or stream source against the schema.</p><p>The following example, <code class="literal">Validate</code>, is in the form of a simple command-line utility that you can use to test out your XML and schemas. Just give it the XML filename and an XML Schema file (<span class="emphasis"><em>.xsd</em></span> file) as arguments:</p><a id="I_24_tt1339"/><pre class="programlisting"> <code class="kn">import</code> <code class="nn">javax.xml.XMLConstants</code><code class="o">;</code> <code class="kn">import</code> <code class="nn">javax.xml.validation.*</code><code class="o">;</code> <code class="kn">import</code> <code class="nn">org.xml.sax.*</code><code class="o">;</code> <code class="kn">import</code> <code class="nn">javax.xml.transform.sax.SAXSource</code><code class="o">;</code> <code class="kn">import</code> <code class="nn">javax.xml.transform.Source</code><code class="o">;</code> <code class="kn">import</code> <code class="nn">javax.xml.transform.stream.StreamSource</code><code class="o">;</code> <code class="err"> </code> <code class="kd">public</code> <code class="kd">class</code> <code class="nc">Validate</code> <code class="o">{</code> <code class="kd">public</code> <code class="kd">static</code> <code class="kt">void</code> <code class="nf">main</code><code class="o">(</code> <code class="n">String</code> <code class="o">[]</code> <code class="n">args</code> <code class="o">)</code> <code class="kd">throws</code> <code class="n">Exception</code> <code class="o">{</code> <code class="k">if</code> <code class="o">(</code> <code class="n">args</code><code class="o">.</code><code class="na">length</code> <code class="o">!=</code> <code class="mi">2</code> <code class="o">)</code> <code class="o">{</code> <code class="n">System</code><code class="o">.</code><code class="na">err</code><code class="o">.</code><code class="na">println</code><code class="o">(</code><code class="s">"usage: Validate xmlfile.xml xsdfile.xsd"</code><code class="o">);</code> <code class="n">System</code><code class="o">.</code><code class="na">exit</code><code class="o">(</code><code class="mi">1</code><code class="o">);</code> <code class="o">}</code> <code class="n">String</code> <code class="n">xmlfile</code> <code class="o">=</code> <code class="n">args</code><code class="o">[</code><code class="mi">0</code><code class="o">],</code> <code class="n">xsdfile</code> <code class="o">=</code> <code class="n">args</code><code class="o">[</code><code class="mi">1</code><code class="o">];</code> <code class="err"> </code> <code class="n">SchemaFactory</code> <code class="n">factory</code> <code class="o">=</code> <code class="n">SchemaFactory</code><code class="o">.</code><code class="na">newInstance</code><code class="o">(</code> <code class="n">XMLConstants</code><code class="o">.</code><code class="na">W3C_XML_SCHEMA_NS_URI</code><code class="o">);</code> <code class="n">Schema</code> <code class="n">schema</code> <code class="o">=</code> <code class="n">factory</code><code class="o">.</code><code class="na">newSchema</code><code class="o">(</code> <code class="k">new</code> <code class="n">StreamSource</code><code class="o">(</code> <code class="n">xsdfile</code> <code class="o">)</code> <code class="o">);</code> <code class="n">Validator</code> <code class="n">validator</code> <code class="o">=</code> <code class="n">schema</code><code class="o">.</code><code class="na">newValidator</code><code class="o">();</code> <code class="err"> </code> <code class="n">ErrorHandler</code> <code class="n">errHandler</code> <code class="o">=</code> <code class="k">new</code> <code class="n">ErrorHandler</code><code class="o">()</code> <code class="o">{</code> <code class="kd">public</code> <code class="kt">void</code> <code class="nf">error</code><code class="o">(</code> <code class="n">SAXParseException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code><code class="n">e</code><code class="o">);</code> <code class="o">}</code> <code class="kd">public</code> <code class="kt">void</code> <code class="nf">fatalError</code><code class="o">(</code> <code class="n">SAXParseException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code><code class="n">e</code><code class="o">);</code> <code class="o">}</code> <code class="kd">public</code> <code class="kt">void</code> <code class="nf">warning</code><code class="o">(</code> <code class="n">SAXParseException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code><code class="n">e</code><code class="o">);</code> <code class="o">}</code> <code class="o">};</code> <code class="n">validator</code><code class="o">.</code><code class="na">setErrorHandler</code><code class="o">(</code> <code class="n">errHandler</code> <code class="o">);</code> <code class="err"> </code> <code class="k">try</code> <code class="o">{</code> <code class="n">validator</code><code class="o">.</code><code class="na">validate</code><code class="o">(</code> <code class="k">new</code> <code class="n">SAXSource</code><code class="o">(</code> <code class="k">new</code> <code class="nf">InputSource</code><code class="o">(</code><code class="s">"zooinventory.xml"</code><code class="o">)</code> <code class="o">)</code> <code class="o">);</code> <code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">SAXException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="c1">// Invalid Document, no error handler</code> <code class="o">}</code> <code class="o">}</code> <code class="o">}</code> <code class="err"> </code></pre><p>The schema types supported initially are listed as constants in the <code class="literal">XMLConstants</code> class. Right now, only W3C XML Schema is implemented and there is also another intriguing type in there that we’ll mention later. Our validation example follows the pattern we’ve seen before, creating a factory, then a <a id="I_indexterm24_id834914" class="indexterm"/><code class="literal">Schema</code> instance. The <code class="literal">Schema</code> represents the grammar and can create <a id="I_indexterm24_id834931" class="indexterm"/><code class="literal">Validator</code> instances that do the work of checking the document structure. Here, we’ve called the <a id="I_indexterm24_id834943" class="indexterm"/><code class="literal">validate()</co