epubjs
Version:
Render ePub documents in the browser, across many devices
369 lines (351 loc) • 51.1 kB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>SAX</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="sect1" title="SAX"><div class="titlepage"><div><div><h1 class="title"><a id="learnjava3-CHP-24-SECT-3"/>SAX</h1></div></div></div><p>SAX is a low-level, event-style API for parsing XML documents. SAX
originated in Java, but has been implemented in many languages. We’ll
begin our discussion of the Java XML APIs here at this lower level, and
work our way up to higher-level (and often more convenient) APIs as we
go.</p><div class="sect2" title="The SAX API"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-3.1"/>The SAX API</h2></div></div></div><p><a id="idx11203" class="indexterm"/> <a id="idx11218" class="indexterm"/>To use SAX, we’ll draw on classes from the <a id="I_indexterm24_id829064" class="indexterm"/><code class="literal">org.xml.sax</code> package,
standardized by the W3C. This package holds interfaces common to all
implementations of SAX. To perform the actual parsing, we’ll need the
<a id="I_indexterm24_id829078" class="indexterm"/><code class="literal">javax.xml.parsers</code>
package, which is the standard Java package for accessing XML parsers.
The <code class="literal">java.xml.parsers</code> package is part
of the Java API for XML Processing (JAXP), which allows different parser
implementations to be used with Java in a portable way.</p><p>To read an XML document with SAX, we first register an <a id="I_indexterm24_id829099" class="indexterm"/><code class="literal">org.xml.sax.ContentHandler</code> class with the
parser. The <code class="literal">ContentHandler</code> has
methods that are called in response to parts of the document. For
example, the <code class="literal">ContentHandler</code>’s
<a id="I_indexterm24_id829122" class="indexterm"/><code class="literal">startElement()</code> method
is called when an opening tag is encountered, and the <a id="I_indexterm24_id829133" class="indexterm"/><code class="literal">endElement()</code> method is
called when the tag is closed. Attributes are provided with the <code class="literal">startElement()</code> call. Text content of elements
is passed through a separate method called <a id="I_indexterm24_id829151" class="indexterm"/><code class="literal">characters()</code>. The
<code class="literal">characters()</code> method may be invoked
repeatedly to supply more text as it is read, but it often gets the
whole string in one bite. The following are the method signatures of
these methods of the <code class="literal">ContentHandler</code>
class.</p><a id="I_24_tt1303"/><pre class="programlisting"><code class="kd">public</code> <code class="kt">void</code> <code class="nf">startElement</code><code class="o">(</code>
<code class="n">String</code> <code class="n">namespace</code><code class="o">,</code> <code class="n">String</code> <code class="n">localname</code><code class="o">,</code> <code class="n">String</code> <code class="n">qname</code><code class="o">,</code> <code class="n">Attributes</code> <code class="n">atts</code> <code class="o">);</code>
<code class="kd">public</code> <code class="kt">void</code> <code class="nf">characters</code><code class="o">(</code>
<code class="kt">char</code><code class="o">[]</code> <code class="n">ch</code><code class="o">,</code> <code class="kt">int</code> <code class="n">start</code><code class="o">,</code> <code class="kt">int</code> <code class="n">len</code> <code class="o">);</code>
<code class="kd">public</code> <code class="kt">void</code> <code class="nf">endElement</code><code class="o">(</code>
<code class="n">String</code> <code class="n">namespace</code><code class="o">,</code> <code class="n">String</code> <code class="n">localname</code><code class="o">,</code> <code class="n">String</code> <code class="n">qname</code> <code class="o">);</code></pre><p>The <code class="literal">qname</code> parameter is the
<span class="emphasis"><em>qualified name</em></span> of the element: this is the element
name, prefixed with any namespace that may be applied. When you’re
working with namespaces, the <code class="literal">namespace</code> and <code class="literal">localname</code> parameters are also supplied,
providing the namespace and unqualified element name separately.</p><p>The <code class="literal">ContentHandler</code> interface
also contains methods called in response to the start and end of the
document, <code class="literal">startDocument()</code> and
<code class="literal">endDocument()</code>, as well as those for
handling namespace mapping, special XML instructions, and whitespace
that is not part of the text content and may optionally be ignored.
We’ll confine ourselves to the three previous methods for our examples.
As with many other Java interfaces, a simple implementation, <code class="literal">org.xml.sax.helpers.DefaultHandler</code>, is
provided for us that allows us to override only the methods in which
we’re interested.</p><div class="sect3" title="JAXP"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.1.1"/>JAXP</h3></div></div></div><p><a id="idx11194" class="indexterm"/> <a id="idx11197" class="indexterm"/>To perform the parsing, we’ll need to get a parser from
the <a id="I_indexterm24_id829270" class="indexterm"/><code class="literal">javax.xml.parsers</code>
package. JAXP abstracts the process of getting a parser through a
<span class="emphasis"><em>factory pattern</em></span>, allowing different parser
implementations to be plugged into the Java platform. The following
snippet constructs a <a id="I_indexterm24_id829286" class="indexterm"/><code class="literal">SAXParser</code> object and
then gets an <a id="I_indexterm24_id829297" class="indexterm"/><code class="literal">XMLReader</code> used to
parse a file:</p><a id="I_24_tt1304"/><pre class="programlisting"> <code class="kn">import</code> <code class="nn">javax.xml.parsers.*</code><code class="o">;</code>
<code class="n">SAXParserFactory</code> <code class="n">factory</code> <code class="o">=</code> <code class="n">SAXParserFactory</code><code class="o">.</code><code class="na">newInstance</code><code class="o">();</code>
<code class="n">SAXParser</code> <code class="n">saxParser</code> <code class="o">=</code> <code class="n">factory</code><code class="o">.</code><code class="na">newSAXParser</code><code class="o">();</code>
<code class="n">XMLReader</code> <code class="n">reader</code> <code class="o">=</code> <code class="n">saxParser</code><code class="o">.</code><code class="na">getXMLReader</code><code class="o">();</code>
<code class="n">reader</code><code class="o">.</code><code class="na">setContentHandler</code><code class="o">(</code> <code class="n">myContentHandler</code> <code class="o">);</code>
<code class="n">reader</code><code class="o">.</code><code class="na">parse</code><code class="o">(</code> <code class="s">"myfile.xml"</code> <code class="o">);</code></pre><p>You might expect the <code class="literal">SAXParser</code> to have the <code class="literal">parse</code> method. The <code class="literal">XMLReader</code> intermediary was added to support
changes in the SAX API between 1.0 and 2.0. Later, we’ll discuss some
options that can be set to govern how XML parsers operate. These
options are normally set through methods on the parser factory (e.g.,
<code class="literal">SAXParserFactory</code>) and not the
parser itself. This is because the factory may wish to use different
implementations to support different required features.<a id="I_indexterm24_id829348" class="indexterm"/><a id="I_indexterm24_id829355" class="indexterm"/></p></div><div class="sect3" title="SAX’s strengths and weaknesses"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.1.2"/>SAX’s strengths and weaknesses</h3></div></div></div><p><a id="idx11204" class="indexterm"/> <a id="I_indexterm24_id829382" class="indexterm"/>The primary motivation for using SAX instead of the
higher-level APIs that we’ll discuss later is that it is lightweight
and event-driven. SAX doesn’t require maintaining the entire document
in memory. So if, for example, you need to grab the text of just a few
elements from a document, or if you need to extract elements from a
large stream of XML, you can do so efficiently with SAX. The
event-driven nature of SAX also allows you to take actions as the
beginning and end tags are parsed. This can be useful for directly
manipulating your own models without first going through another
representation. The primary weakness of SAX is that you are operating
on a tag-by-tag level with no help from the parser to maintain
context. We’ll talk about how to overcome this limitation next. Later,
we’ll also talk about the new XPath API, which combines much of the
benefits of both SAX and DOM in a form that is easier to
use.<a id="I_indexterm24_id829412" class="indexterm"/><a id="I_indexterm24_id829419" class="indexterm"/><a id="I_indexterm24_id829426" class="indexterm"/></p></div></div><div class="sect2" title="Building a Model Using SAX"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-3.2"/>Building a Model Using SAX</h2></div></div></div><p><a id="idx11217" class="indexterm"/>The <code class="literal">ContentHandler</code>
mechanism for receiving SAX events is very simple. It should be easy to
see how one could use it to capture the value or attributes of a single
element in a document. What may be harder to see is how one could use
SAX to populate a real Java object model. Creating or pushing data into
Java objects from XML is such a common activity that it’s worth
considering how the SAX API applies to this problem. The following
example, <a id="I_indexterm24_id829468" class="indexterm"/><code class="literal">SAXModelBuilder</code>, does
just this, reading an XML description and creating Java objects on
command. This example is a bit unusual in that we resort to using some
reflection to do the job, but this is a case where we’re trying to
interact with Java objects dynamically.</p><p>In this section, we’ll start by creating some XML along with
corresponding Java classes that serve as the model for this XML. The
focus of the example code here is to create the generic model builder
that uses SAX to read the XML and populate the model classes with their
data. The idea is that the developer is creating only XML and model
classes—no custom code—to do the
parsing. You might use code like this to read configuration files for an
application or to implement a custom XML “language” for describing
workflows. The advantage is that there is no real parsing code in the
application at all, only in the generic builder tool. Finally, late in
this chapter when we discuss the more powerful JAXB APIs, we’ll reuse
the Java object model from this example simply by adding a few
annotations.</p><div class="sect3" title="Creating the XML file"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.2.1"/>Creating the XML file</h3></div></div></div><p><a id="I_indexterm24_id829509" class="indexterm"/>The first thing we’ll need is a nice XML document to
parse. Luckily, it’s inventory time at the zoo! The following
document, <span class="emphasis"><em>zooinventory.xml</em></span>, describes two of the
zoo’s residents, including some vital information about their
diets:</p><a id="I_24_tt1305"/><pre class="programlisting"><code class="o"><?</code><code class="n">xml</code> <code class="n">version</code><code class="o">=</code><code class="s">"1.0"</code> <code class="n">encoding</code><code class="o">=</code><code class="s">"UTF-8"</code><code class="o">?></code>
<code class="o"><</code><code class="n">inventory</code><code class="o">></code>
<code class="o"><</code><code class="n">animal</code> <code class="n">animalClass</code><code class="o">=</code><code class="s">"mammal"</code><code class="o">></code>
<code class="o"><</code><code class="n">name</code><code class="o">></code><code class="n">Song</code> <code class="n">Fang</code><code class="o"></</code><code class="n">name</code><code class="o">></code>
<code class="o"><</code><code class="n">species</code><code class="o">></code><code class="n">Giant</code> <code class="n">Panda</code><code class="o"></</code><code class="n">species</code><code class="o">></code>
<code class="o"><</code><code class="n">habitat</code><code class="o">></code><code class="n">China</code><code class="o"></</code><code class="n">habitat</code><code class="o">></code>
<code class="o"><</code><code class="n">food</code><code class="o">></code><code class="n">Bamboo</code><code class="o"></</code><code class="n">food</code><code class="o">></code>
<code class="o"><</code><code class="n">temperament</code><code class="o">></code><code class="n">Friendly</code><code class="o"></</code><code class="n">temperament</code><code class="o">></code>
<code class="o"><</code><code class="n">weight</code><code class="o">></code><code class="mf">45.0</code><code class="o"></</code><code class="n">weight</code><code class="o">></code>
<code class="o"></</code><code class="n">animal</code><code class="o">></code>
<code class="o"><</code><code class="n">animal</code> <code class="n">animalClass</code><code class="o">=</code><code class="s">"mammal"</code><code class="o">></code>
<code class="o"><</code><code class="n">name</code><code class="o">></code><code class="n">Cocoa</code><code class="o"></</code><code class="n">name</code><code class="o">></code>
<code class="o"><</code><code class="n">species</code><code class="o">></code><code class="n">Gorilla</code><code class="o"></</code><code class="n">species</code><code class="o">></code>
<code class="o"><</code><code class="n">habitat</code><code class="o">></code><code class="n">Central</code> <code class="n">Africa</code><code class="o"></</code><code class="n">habitat</code><code class="o">></code>
<code class="o"><</code><code class="n">foodRecipe</code><code class="o">></code>
<code class="o"><</code><code class="n">name</code><code class="o">></code><code class="n">Gorilla</code> <code class="n">Chow</code><code class="o"></</code><code class="n">name</code><code class="o">></code>
<code class="o"><</code><code class="n">ingredient</code><code class="o">></code><code class="n">fruit</code><code class="o"></</code><code class="n">ingredient</code><code class="o">></code>
<code class="o"><</code><code class="n">ingredient</code><code class="o">></code><code class="n">shoots</code><code class="o"></</code><code class="n">ingredient</code><code class="o">></code>
<code class="o"><</code><code class="n">ingredient</code><code class="o">></code><code class="n">leaves</code><code class="o"></</code><code class="n">ingredient</code><code class="o">></code>
<code class="o"></</code><code class="n">foodRecipe</code><code class="o">></code>
<code class="o"><</code><code class="n">temperament</code><code class="o">></code><code class="n">Know</code><code class="o">-</code><code class="n">it</code><code class="o">-</code><code class="n">all</code><code class="o"></</code><code class="n">temperament</code><code class="o">></code>
<code class="o"><</code><code class="n">weight</code><code class="o">></code><code class="mf">45.0</code><code class="o"></</code><code class="n">weight</code><code class="o">></code>
<code class="o"></</code><code class="n">animal</code><code class="o">></code>
<code class="o"></</code><code class="n">inventory</code><code class="o">></code></pre><p>The document is fairly simple. The root element, <code class="literal"><inventory></code>, contains two <code class="literal"><animal></code> elements as children.
<code class="literal"><animal></code> contains several
simple text elements for things like name, species, and habitat. It
also contains either a simple <code class="literal"><food></code> element or a complex <code class="literal"><foodRecipe></code> element. Finally, note
that the <code class="literal"><animal></code> element has
one attribute, <code class="literal">animalClass</code>, that
describes the zoological classification of the creature (e.g., Mammal,
Bird, Fish, etc.). This gives us a representative set of XML features
to play with in our examples.</p></div><div class="sect3" title="The model"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.2.2"/>The model</h3></div></div></div><p>Now let’s make a Java object model for our zoo inventory. This
part is very mechanical—we
simply create a class for each of the complex element types in our
XML; anything other than a simple string or number. Best practices
would probably be to use the standard JavaBeans property design
pattern here—that is, to use a private field (instance variable) plus
a pair of get and set methods for each property. However, because
these classes are just simple data holders and we want to keep our
example small, we’re going to opt to use public fields. Everything
we’re going to do in this example and, much more importantly,
everything we’re going to do when we reuse this model in the later
JAXB binding example, can be made to work with either field or
JavaBeans-style method-based properties equivalently. In this example,
it would just be a matter of how we set the values and later in the
JAXB case, it would be a matter of where we put the annotations. So
here are the classes:</p><a id="I_24_tt1306"/><pre class="programlisting"> <code class="kd">public</code> <code class="kd">class</code> <code class="nc">Inventory</code> <code class="o">{</code>
<code class="kd">public</code> <code class="n">List</code><code class="o"><</code><code class="n">Animal</code><code class="o">></code> <code class="n">animal</code> <code class="o">=</code> <code class="k">new</code> <code class="n">ArrayList</code><code class="o"><>();</code>
<code class="o">}</code>
<code class="kd">public</code> <code class="kd">class</code> <code class="nc">Animal</code>
<code class="o">{</code>
<code class="kd">public</code> <code class="kd">static</code> <code class="kd">enum</code> <code class="n">AnimalClass</code> <code class="o">{</code> <code class="n">mammal</code><code class="o">,</code> <code class="n">reptile</code><code class="o">,</code> <code class="n">bird</code><code class="o">,</code> <code class="n">fish</code><code class="o">,</code> <code class="n">amphibian</code><code class="o">,</code>
<code class="n">invertebrate</code> <code class="o">}</code>
<code class="kd">public</code> <code class="n">AnimalClass</code> <code class="n">animalClass</code><code class="o">;</code>
<code class="kd">public</code> <code class="n">String</code> <code class="n">name</code><code class="o">,</code> <code class="n">species</code><code class="o">,</code> <code class="n">habitat</code><code class="o">,</code> <code class="n">food</code><code class="o">,</code> <code class="n">temperament</code><code class="o">;</code>
<code class="kd">public</code> <code class="n">Double</code> <code class="n">weight</code><code class="o">;</code>
<code class="kd">public</code> <code class="n">FoodRecipe</code> <code class="n">foodRecipe</code><code class="o">;</code>
<code class="kd">public</code> <code class="n">String</code> <code class="nf">toString</code><code class="o">()</code> <code class="o">{</code> <code class="k">return</code> <code class="n">name</code> <code class="o">+</code><code class="s">"("</code><code class="o">+</code><code class="n">animalClass</code><code class="o">+</code><code class="s">",</code>
<code class="s"> "</code><code class="o">+</code><code class="n">species</code><code class="o">+</code><code class="s">")"</code><code class="o">;</code> <code class="o">}</code>
<code class="o">}</code>
<code class="kd">public</code> <code class="kd">class</code> <code class="nc">FoodRecipe</code>
<code class="o">{</code>
<code class="kd">public</code> <code class="n">String</code> <code class="n">name</code><code class="o">;</code>
<code class="kd">public</code> <code class="n">List</code><code class="o"><</code><code class="n">String</code><code class="o">></code> <code class="n">ingredient</code> <code class="o">=</code> <code class="k">new</code> <code class="n">ArrayList</code><code class="o"><</code><code class="n">String</code><code class="o">>();</code>
<code class="kd">public</code> <code class="n">String</code> <code class="nf">toString</code><code class="o">()</code> <code class="o">{</code> <code class="k">return</code> <code class="n">name</code> <code class="o">+</code> <code class="s">": "</code><code class="o">+</code> <code class="n">ingredient</code><code class="o">.</code><code class="na">toString</code><code class="o">();</code> <code class="o">}</code>
<code class="o">}</code></pre><p>As you can see, for the cases where we need to represent a
sequence of elements (e.g., animal in inventory), we have used a
<code class="literal">List</code> collection. Also note that the
property that will serve to hold our <code class="literal">animalClass</code> attribute (e.g., mammal) is
represented as an enum type. We’ve also throw in simple <code class="literal">toString()</code> methods for later use. One more
thing—we’ve chosen to name our collections in the singular form here
(e.g., “animal,” as opposed to “animals”) just because it is
convenient. We’ll talk about mapping names more in the JAXB
example.</p></div><div class="sect3" title="The SAXModelBuilder"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.2.3"/>The SAXModelBuilder</h3></div></div></div><p><a id="idx11201" class="indexterm"/>Let’s get down to business and write our builder tool.
Now we could do this by using the SAX API in combination with some
“hardcoded” knowledge about the incoming tags and the classes we want
to output (imagine a whole bunch of switches or if/then statements);
however, we’re going do better than that and make a more generic model
builder that maps our XML to classes by name. The <code class="literal">SAXModelBuilder</code> that we create in this
section receives SAX events from parsing an XML file and dynamically
constructs objects or sets properties corresponding to the names of
the element tags. Our model builder is small, but it handles the most
common structures: nested elements and elements with simple text or
numeric content. We treat attributes as equivalent to element data as
far as our model classes go and we support three basic types: <code class="literal">String</code>, <code class="literal">Double</code>, and <code class="literal">Enum</code>.</p><p>Here is the code:</p><a id="I_24_tt1307"/><pre class="programlisting"><code class="kn">import</code> <code class="nn">org.xml.sax.*</code><code class="o">;</code>
<code class="kn">import</code> <code class="nn">org.xml.sax.helpers.*</code><code class="o">;</code>
<code class="kn">import</code> <code class="nn">java.util.*</code><code class="o">;</code>
<code class="kn">import</code> <code class="nn">java.lang.reflect.*</code><code class="o">;</code>
<code class="kd">public</code> <code class="kd">class</code> <code class="nc">SAXModelBuilder</code> <code class="kd">extends</code> <code class="n">DefaultHandler</code>
<code class="o">{</code>
<code class="n">Stack</code><code class="o"><</code><code class="n">Object</code><code class="o">></code> <code class="n">stack</code> <code class="o">=</code> <code class="k">new</code> <code class="n">Stack</code><code class="o"><>();</code>
<code class="kd">public</code> <code class="kt">void</code> <code class="nf">startElement</code><code class="o">(</code> <code class="n">String</code> <code class="n">namespace</code><code class="o">,</code> <code class="n">String</code> <code class="n">localname</code><code class="o">,</code> <code class="n">String</code> <code class="n">qname</code><code class="o">,</code>
<code class="n">Attributes</code> <code class="n">atts</code> <code class="o">)</code> <code class="kd">throws</code> <code class="n">SAXException</code>
<code class="o">{</code>
<code class="c1">// Construct the new element and set any attributes on it</code>
<code class="n">Object</code> <code class="n">element</code><code class="o">;</code>
<code class="k">try</code> <code class="o">{</code>
<code class="n">String</code> <code class="n">className</code> <code class="o">=</code> <code class="n">Character</code><code class="o">.</code><code class="na">toUpperCase</code><code class="o">(</code> <code class="n">qname</code><code class="o">.</code><code class="na">charAt</code><code class="o">(</code> <code class="mi">0</code> <code class="o">)</code> <code class="o">)</code> <code class="o">+</code>
<code class="n">qname</code><code class="o">.</code><code class="na">substring</code><code class="o">(</code> <code class="mi">1</code> <code class="o">);</code>
<code class="n">element</code> <code class="o">=</code> <code class="n">Class</code><code class="o">.</code><code class="na">forName</code><code class="o">(</code> <code class="n">className</code> <code class="o">).</code><code class="na">newInstance</code><code class="o">();</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">Exception</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">element</code> <code class="o">=</code> <code class="k">new</code> <code class="n">StringBuffer</code><code class="o">();</code>
<code class="o">}</code>
<code class="k">for</code><code class="o">(</code> <code class="kt">int</code> <code class="n">i</code><code class="o">=</code><code class="mi">0</code><code class="o">;</code> <code class="n">i</code><code class="o"><</code><code class="n">atts</code><code class="o">.</code><code class="na">getLength</code><code class="o">();</code> <code class="n">i</code><code class="o">++)</code> <code class="o">{</code>
<code class="k">try</code> <code class="o">{</code>
<code class="n">setProperty</code><code class="o">(</code> <code class="n">atts</code><code class="o">.</code><code class="na">getQName</code><code class="o">(</code> <code class="n">i</code> <code class="o">),</code> <code class="n">element</code><code class="o">,</code> <code class="n">atts</code><code class="o">.</code><code class="na">getValue</code><code class="o">(</code> <code class="n">i</code> <code class="o">)</code> <code class="o">);</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">Exception</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="k">throw</code> <code class="k">new</code> <code class="n">SAXException</code><code class="o">(</code> <code class="s">"Error: "</code><code class="o">,</code> <code class="n">e</code> <code class="o">);</code> <code class="o">}</code>
<code class="o">}</code>
<code class="n">stack</code><code class="o">.</code><code class="na">push</code><code class="o">(</code> <code class="n">element</code> <code class="o">);</code>
<code class="o">}</code>
<code class="kd">public</code> <code class="kt">void</code> <code class="nf">endElement</code><code class="o">(</code> <code class="n">String</code> <code class="n">namespace</code><code class="o">,</code> <code class="n">String</code> <code class="n">localname</code><code class="o">,</code> <code class="n">String</code> <code class="n">qname</code> <code class="o">)</code>
<code class="kd">throws</code> <code class="n">SAXException</code>
<code class="o">{</code>
<code class="c1">// Add the element to its parent</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">stack</code><code class="o">.</code><code class="na">size</code><code class="o">()</code> <code class="o">></code> <code class="mi">1</code><code class="o">)</code> <code class="o">{</code>
<code class="n">Object</code> <code class="n">element</code> <code class="o">=</code> <code class="n">stack</code><code class="o">.</code><code class="na">pop</code><code class="o">();</code>
<code class="k">try</code> <code class="o">{</code>
<code class="n">setProperty</code><code class="o">(</code> <code class="n">qname</code><code class="o">,</code> <code class="n">stack</code><code class="o">.</code><code class="na">peek</code><code class="o">(),</code> <code class="n">element</code> <code class="o">);</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">Exception</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="k">throw</code> <code class="k">new</code> <code class="n">SAXException</code><code class="o">(</code> <code class="s">"Error: "</code><code class="o">,</code> <code class="n">e</code> <code class="o">);</code> <code class="o">}</code>
<code class="o">}</code>
<code class="o">}</code>
<code class="kd">public</code> <code class="kt">void</code> <code class="nf">characters</code><code class="o">(</code><code class="kt">char</code><code class="o">[]</code> <code class="n">ch</code><code class="o">,</code> <code class="kt">int</code> <code class="n">start</code><code class="o">,</code> <code class="kt">int</code> <code class="n">len</code> <code class="o">)</code>
<code class="o">{</code>
<code class="c1">// Receive element content text</code>
<code class="n">String</code> <code class="n">text</code> <code class="o">=</code> <code class="k">new</code> <code class="n">String</code><code class="o">(</code> <code class="n">ch</code><code class="o">,</code> <code class="n">start</code><code class="o">,</code> <code class="n">len</code> <code class="o">);</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">text</code><code class="o">.</code><code class="na">trim</code><code class="o">().</code><code class="na">length</code><code class="o">()</code> <code class="o">==</code> <code class="mi">0</code> <code class="o">)</code> <code class="o">{</code> <code class="k">return</code><code class="o">;</code> <code class="o">}</code>
<code class="o">((</code><code class="n">StringBuffer</code><code class="o">)</code><code class="n">stack</code><code class="o">.</code><code class="na">peek</code><code class="o">()).</code><code class="na">append</code><code class="o">(</code> <code class="n">text</code> <code class="o">);</code>
<code class="o">}</code>
<code class="kt">void</code> <code class="nf">setProperty</code><code class="o">(</code> <code class="n">String</code> <code class="n">name</code><code class="o">,</code> <code class="n">Object</code> <code class="n">target</code><code class="o">,</code> <code class="n">Object</code> <code class="n">value</code> <code class="o">)</code>
<code class="kd">throws</code> <code class="n">SAXException</code><code class="o">,</code> <code class="n">IllegalAccessException</code><code class="o">,</code> <code class="n">NoSuchFieldException</code>
<code class="o">{</code>
<code class="n">Field</code> <code class="n">field</code> <code class="o">=</code> <code class="n">target</code><code class="o">.</code><code class="na">getClass</code><code class="o">().</code><code class="na">getField</code><code class="o">(</code> <code class="n">name</code> <code class="o">);</code>
<code class="c1">// Convert values to field type</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">value</code> <code class="k">instanceof</code> <code class="n">StringBuffer</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">value</code> <code class="o">=</code> <code class="n">value</code><code class="o">.</code><code class="na">toString</code><code class="o">();</code>
<code class="o">}</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">field</code><code class="o">.</code><code class="na">getType</code><code class="o">()</code> <code class="o">==</code> <code class="n">Double</code><code class="o">.</code><code class="na">class</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">value</code> <code class="o">=</code> <code class="n">Double</code><code class="o">.</code><code class="na">parseDouble</code><code class="o">(</code> <code class="n">value</code><code class="o">.</code><code class="na">toString</code><code class="o">()</code> <code class="o">);</code>
<code class="o">}</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">Enum</code><code class="o">.</code><code class="na">class</code><code class="o">.</code><code class="na">isAssignableFrom</code><code class="o">(</code> <code class="n">field</code><code class="o">.</code><code class="na">getType</code><code class="o">()</code> <code class="o">)</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">value</code> <code class="o">=</code> <code class="n">Enum</code><code class="o">.</code><code class="na">valueOf</code><code class="o">(</code> <code class="o">(</code><code class="n">Class</code><code class="o"><</code><code class="n">Enum</code><code class="o">>)</code><code class="n">field</code><code class="o">.</code><code class="na">getType</code><code class="o">(),</code>
<code class="n">value</code><code class="o">.</code><code class="na">toString</code><code class="o">()</code> <code class="o">);</code>
<code class="o">}</code>
<code class="c1">// Apply to field</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">field</code><code class="o">.</code><code class="na">getType</code><code class="o">()</code> <code class="o">==</code> <code class="n">value</code><code class="o">.</code><code class="na">getClass</code><code class="o">()</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">field</code><code class="o">.</code><code class="na">set</code><code class="o">(</code> <code class="n">target</code><code class="o">,</code> <code class="n">value</code> <code class="o">);</code>
<code class="o">}</code> <code class="k">else</code>
<code class="k">if</code> <code class="o">(</code> <code class="n">Collection</code><code class="o">.</code><code class="na">class</code><code class="o">.</code><code class="na">isAssignableFrom</code><code class="o">(</code> <code class="n">field</code><code class="o">.</code><code class="na">getType</code><code class="o">()</code> <code class="o">)</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">Collection</code> <code class="n">collection</code> <code class="o">=</code> <code class="o">(</code><code class="n">Collection</code><code class="o">)</code><code class="n">field</code><code class="o">.</code><code class="na">get</code><code class="o">(</code> <code class="n">target</code> <code class="o">);</code>
<code class="n">collection</code><code class="o">.</code><code class="na">add</code><code class="o">(</code> <code class="n">value</code> <code class="o">);</code>
<code class="o">}</code> <code class="k">else</code> <code class="o">{</code>
<code class="k">throw</code> <code class="k">new</code> <code class="nf">RuntimeException</code><code class="o">(</code> <code class="s">"Unable to set property..."</code> <code class="o">);</code>
<code class="o">}</code>
<code class="o">}</code>
<code class="kd">public</code> <code class="n">Object</code> <code class="nf">getModel</code><code class="o">()</code> <code class="o">{</code> <code class="k">return</code> <code class="n">stack</code><code class="o">.</code><code class="na">pop</code><code class="o">();</code> <code class="o">}</code>
<code class="o">}</code></pre><p>The code may be a little hard to digest at first: we are using
reflection to construct the objects and set the properties on the
fields. But the gist of it is really just that the three methods,
<a id="I_indexterm24_id829772" class="indexterm"/><code class="literal">startElement()</code>,
<a id="I_indexterm24_id829783" class="indexterm"/><code class="literal">characters()</code>, and
<code class="literal">endElement()</code>‚ are called in
response to the tags of the input and we store the data as we receive
it. Let’s take a look.</p><p>The <code class="literal">SAXModelBuilder</code> extends
<code class="literal">DefaultHandler</code> to help us implement
the <code class="literal">Content</code><code class="literal">Handler</code> interface. Because SAX events
follow the hierarchical structure of the XML document, we use a simple stack to keep
track of which object we are currently parsing. At the start of each
element, the model builder attempts to create an instance of a class
with the same name (uppercase) as the element and push it onto the top
of the stack. Each nested opening tag creates a new object on the
stack until we encounter a closing tag. Upon reaching an end of an
element, we pop the current object off the stack and attempt to apply
its value to its parent (the enclosing XML element), which is the new
top of the stack. For elements with simple content that do not have a
corresponding class, we place a <code class="literal">StringBuffer</code> on the stack as a stand-in to
hold the character content until the tag is closed. In this case, the
name of the tag indicates the property on the parent that should get
the text and upon seeing the closing tag, we apply it in the same way.
Attributes are applied to the current object on the stack within the
<a id="I_indexterm24_id829845" class="indexterm"/><code class="literal">startElement()</code> method
using the same technique. The final closing tag leaves the top-level
element (inventory in this case) on the stack for us to
retrieve.</p><p>To set values on our objects, we use our <a id="I_indexterm24_id829859" class="indexterm"/><code class="literal">setProperty()</code> method.
It uses reflection to look for a field matching the name of the tag
within the specified object. It also handles some simple type
conversions based on the type of the field found. If the field is of
type <code class="literal">Double</code>, we parse the text to a
number; if it is an <code class="literal">Enum</code> type, we
find the matching enum value represented by the text. Finally, if the
field is not a simple field but is a <code class="literal">Collection</code> representing an XML sequence,
then we invoke its <code class="literal">add()</code> method to
add the child to the collection instead of trying to assign to the
field itself.<a id="I_indexterm24_id829896" class="indexterm"/></p></div><div class="sect3" title="Test drive"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.2.4"/>Test drive</h3></div></div></div><p><a id="idx11202" class="indexterm"/>Finally, we can test drive the model builder with the
following class, <code class="literal">TestSAXModelBuilder</code>, which calls the SAX
parser, setting an instance of our <code class="literal">SAXModelBuilder</code> as the content handler. The
test class then prints some of the information parsed from the
<span class="emphasis"><em>zooinventory.xml</em></span> file:</p><a id="I_24_tt1308"/><pre class="programlisting"> <code class="kn">import</code> <code class="nn">org.xml.sax.*</code><code class="o">;</code>
<code class="kn">import</code> <code class="nn">javax.xml.parsers.*</code><code class="o">;</code>
<code class="kd">public</code> <code class="kd">class</code> <code class="nc">TestSAXModelBuilder</code>
<code class="o">{</code>
<code class="kd">public</code> <code class="kd">static</code> <code class="kt">void</code> <code class="nf">main</code><code class="o">(</code> <code class="n">String</code> <code class="o">[]</code> <code class="n">args</code> <code class="o">)</code> <code class="kd">throws</code> <code class="n">Exception</code>
<code class="o">{</code>
<code class="n">SAXParserFactory</code> <code class="n">factory</code> <code class="o">=</code> <code class="n">SAXParserFactory</code>
<code class="o">.</code><code class="na">newInstance</code><code class="o">();</code>
<code class="n">SAXParser</code> <code class="n">saxParser</code> <code class="o">=</code> <code class="n">factory</code><code class="o">.</code><code class="na">newSAXParser</code><code class="o">();</code>
<code class="n">XMLReader</code> <code class="n">parser</code> <code class="o">=</code> <code class="n">saxParser</code><code class="o">.</code><code class="na">getXMLReader</code><code class="o">();</code>
<code class="n">SAXModelBuilder</code> <code class="n">mb</code> <code class="o">=</code> <code class="k">new</code> <code class="n">SAXModelBuilder</code><code class="o">();</code>
<code class="n">parser</code><code class="o">.</code><code class="na">setContentHandler</code><code class="o">(</code> <code class="n">mb</code> <code class="o">);</code>
<code class="n">parser</code><code class="o">.</code><code class="na">parse</code><code class="o">(</code> <code class="k">new</code> <code class="n">InputSource</code><code class="o">(</code><code class="s">"zooinventory.xml"</code><code class="o">)</code> <code class="o">);</code>
<code class="n">Inventory</code> <code class="n">inventory</code> <code class="o">=</code> <code class="o">(</code><code class="n">Inventory</code><code class="o">)</code><code class="n">mb</code><code class="o">.</code><code class="na">getModel</code><code class="o">();</code>
<code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code><code class="s">"Animals = "</code><code class="o">+</code><code class="n">inventory</code><code class="o">.</code><code class="na">animal</code><code class="o">);</code>
<code class="n">Animal</code> <code class="n">cocoa</code> <code class="o">=</code> <code class="o">(</code><code class="n">Animal</code><code class="o">)(</code><code class="n">inventory</code><code class="o">.</code><code class="na">animal</code><code class="o">.</code><code class="na">get</code><code class="o">(</code><code class="mi">1</code><code class="o">));</code>
<code class="n">FoodRecipe</code> <code class="n">recipe</code> <code class="o">=</code> <code class="n">cocoa</code><code class="o">.</code><code class="na">foodRecipe</code><code class="o">;</code>
<code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code> <code class="s">"Recipe = "</code><code class="o">+</code><code class="n">recipe</code> <code class="o">);</code>
<code class="o">}</code>
<code class="o">}</code></pre><p>The output should look like this:</p><a id="I_24_tt1309"/><pre class="programlisting"><code class="n">Animals</code> <code class="o">=</code> <code class="o">[</code><code class="n">Song</code> <code class="n">Fang</code><code class="o">(</code><code class="n">mammal</code><code class="o">,</code> <code class="n">Giant</code> <code class="n">Panda</code><code class="o">),</code> <code class="n">Cocoa</code><code class="o">(</code><code class="n">mammal</code><code class="o">,</code> <code class="n">Gorilla</code><code class="o">)]</code>
<code class="n">Recipe</code> <code class="o">=</code> <code class="n">Gorilla</code> <code class="nl">Chow:</code> <code class="o">[</code><code class="n">fruit</code><code class="o">,</code> <code class="n">shoots</code><code class="o">,</code> <code class="n">leaves</code><code class="o">]</code></pre><p>In the following sections, we’ll generate the equivalent output
using different tools.<a id="I_indexterm24_id829973" class="indexterm"/></p></div><div class="sect3" title="Limitations and possibilities"><div class="titlepage"><div><div><h3 class="title"><a id="learnjava3-CHP-24-SECT-3.2.5"/>Limitations and possibilities</h3></div></div></div><p><a id="I_indexterm24_id829987" class="indexterm"/>To make our model builder more complete, we could use
more robust naming conventions
for our tags and model classes (taking into account packages and mixed
capitalization, etc.). More generally, we might want to introduce
arbitrary mappings (bindings) between names and classes or properties.
And of course, there is the problem of taking our model and going the
other way, using it to generate an XML document. You can see where
this is going: JAXB will do all of that for us, coming up later in
this chapter.<a id="I_indexterm24_id830009" class="indexterm"/></p></div></div><div class="sect2" title="XMLEncoder/Decoder"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-3.3"/>XMLEncoder/Decoder</h2></div></div></div><p><a id="idx11205" class="indexterm"/> <a id="idx11219" class="indexterm"/> <a id="idx11232" class="indexterm"/>Java includes a standard tool for serializing JavaBeans
classes to XML. The <a id="I_indexterm24_id830063" class="indexterm"/><code class="literal">java.beans</code> package
<code class="literal">XMLEncoder</code> and <code class="literal">XMLDecoder</code> classes are analogous to <code class="literal">java.io</code><code class="literal">ObjectInputStream</code> and <code class="literal">ObjectOutputStream</code>. Instead of using the
native Java serialization format, they store the object state in a
high-level XML format. We say that they are analogous, but the XML
encoder is not a general replacement for Java object serialization.
Instead, it is specialized to work with objects that follow the
JavaBeans design patterns (setter and getter methods for properties),
and it can only store and recover the state of the object that is
expressed through a bean’s public properties in this way.</p><p>When you call it, the <code class="literal">XMLEncoder</code> attempts to construct an in-memory
copy of the graph of beans that you are serializing using only public
constructors and JavaBean properties. As it works, it writes out the
steps required as “instructions” in an XML format. Later, the <code class="literal">XMLDecoder</code> executes these instructions and
reproduces the result. The primary advantage of this process is that it
is highly resilient to changes in the class implementation. While
standard Java object serialization can accommodate many kinds of
“compatible changes” in classes, it requires some help from the
developer to get it right. Because the <code class="literal">XMLEncoder</code> uses only public APIs and writes
instructions in simple XML, it is expected that this form of
serialization will be the most robust way to store the state of
JavaBeans. The process is referred to as <span class="emphasis"><em>long-term
persistence</em></span> for JavaBeans.</p><p>It might seem at first like this would obviate the need for our
<code class="literal">SAXModelBuilder</code> example. Why not
simply write our XML in the format that <code class="literal">XMLDecoder</code> understands and use it