epubjs
Version:
Render ePub documents in the browser, across many devices
84 lines (83 loc) • 9.49 kB
HTML
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>A Bit of Background</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="sect1" title="A Bit of Background"><div class="titlepage"><div><div><h1 class="title"><a id="learnjava3-CHP-24-SECT-1"/>A Bit of Background</h1></div></div></div><p><a id="I_indexterm24_id827758" class="indexterm"/> <a id="idx11213" class="indexterm"/>XML and HTML are called <span class="emphasis"><em>markup
languages</em></span> because of the way they add structure to plain-text
documents—by surrounding parts of the text with tags that indicate
structure or meaning, much as someone with a pen might highlight a
sentence and add a note. While HTML predefines a set of tags and their
structure, XML is a blank slate in which the author gets to define the
tags, the rules, and their meanings.</p><p><a id="I_indexterm24_id827789" class="indexterm"/> <a id="I_indexterm24_id827795" class="indexterm"/>Both XML and HTML owe their lineage to Standard Generalized
Markup Language (SGML)—the mother of all markup languages. SGML has been
used in the publishing industry for decades (including at O’Reilly). But
it wasn’t until the Web captured the world that it came into the
mainstream through HTML. HTML started as a very small application of SGML,
and if HTML has done anything at all, it has proven that simplicity
reigns.</p><div class="sect2" title="Text Versus Binary"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-1.1"/>Text Versus Binary</h2></div></div></div><p><a id="I_indexterm24_id827816" class="indexterm"/> <a id="I_indexterm24_id827822" class="indexterm"/> <a id="I_indexterm24_id827828" class="indexterm"/>When Tim Berners-Lee began postulating the Web back at
CERN in the late 1980s, he wanted to organize project information using
hypertext with links embedded in plain text.<sup>[<a id="learnjava3-CHP-24-FN-1" href="#ftn.learnjava3-CHP-24-FN-1" class="footnote">49</a>]</sup> When the Web needed a protocol, HTTP—a simple, text-based
client-server protocol—was
invented. So, what exactly is so enchanting about the idea of plain
text? Why, for example, didn’t Tim turn to the Microsoft Word format as
the basis for web documents? Surely a binary, non-human-readable format
and a similarly machine-oriented protocol would be more efficient? Since
the Web’s inception, there have now been literally trillions of HTTP
transactions. Was it really a good idea for them to use (English) words
like “GET” and “POST” as part of the protocol?</p><p>The answer, as we’ve all seen, is yes! Whatever humans can read
and undertstand, human developers can work with more easily. There is a
time and place for a high level of optimization (and obscurity), but
when the goal is universal acceptance and cross-platform portability,
simplicity and transparency are paramount. This is the first fundamental
proposition of XML: simple and nominally human-readable data.</p></div><div class="sect2" title="A Universal Parser"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-1.2"/>A Universal Parser</h2></div></div></div><p><a id="I_indexterm24_id827880" class="indexterm"/>Using text to exchange data is not exactly a new idea,
either, but historically, for every new document format that came along,
a new <a id="I_indexterm24_id827890" class="indexterm"/><span class="emphasis"><em>parser</em></span> would have to be written. A
parser is an application that reads a document and understands its
formatting conventions, usually enforcing some rules about the content.
For example, the Java <code class="literal">Properties</code>
class has a parser for the standard properties file format (<a class="xref" href="ch11.html" title="Chapter 11. Core Utilities">Chapter 11</a>). In our simple spreadsheet in <a class="xref" href="ch18.html" title="Chapter 18. More Swing Components">Chapter 18</a>, we wrote a parser capable of
understanding basic mathematical expressions. As we’ve seen, depending
on complexity, parsing can be quite tricky.</p><p>With XML, we can represent data without having to write this kind
of custom parser. This isn’t to say that it’s reasonable to use XML for
everything (e.g., typing math expressions into our spreadsheet), but for
the common types of information that we exchange on the Net, we
shouldn’t have to write parsers that deal with basic syntax and string
manipulation. In conjunction with document-verifying components
(Document Type Definitions [DTDs] or XML Schema), much of the complex
error checking is also done automatically. This is the second
fundamental proposition of XML: standardized parsing and
validation.</p></div><div class="sect2" title="The State of XML"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-1.3"/>The State of XML</h2></div></div></div><p><a id="idx11220" class="indexterm"/>The APIs we’ll discuss in this chapter are powerful and
popular. They are being used around the world to build enterprise-scale
systems every day. In recent years, JAXB Java to XML binding has been
vastly streamlined and simplified (primarily through the use of Java
annotations to replace configuration files and support a “code first”
methodology). However, as with any popular technology, there has been a
recognition of its limitations and some complexity has crept into what
began as simple concepts. In the area of browser-based applications,
some have turned to <a id="I_indexterm24_id827952" class="indexterm"/><a id="I_indexterm24_id827958" class="indexterm"/>JavaScript Object Notation (JSON) as an even
lighter-weight approach that maps natively to JavaScript, especially for
transient communications between client and server. However, XML tools
are still widely used in this area as well. Google’s Protocol
Buffers-encoding scheme is another example of a system-to-system
communication format that has been used in place of XML; in this case,
where very high performance trumps flexibility. But XML remains the most
powerful general format for document and data exchange with the widest
array of tools support.<a id="I_indexterm24_id827974" class="indexterm"/></p></div><div class="sect2" title="The XML APIs"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-1.4"/>The XML APIs</h2></div></div></div><p><a id="I_indexterm24_id827988" class="indexterm"/>All the basic APIs for working with XML are now bundled
with the standard release of Java. This included the <a id="I_indexterm24_id827998" class="indexterm"/><code class="literal">javax.xml</code> standard
extension packages for working with Simple API for XML (SAX), Document
Object Model (DOM), XML Binding JAXB, and Extensible Stylesheet Language (XSL)
transforms, as well as APIs such as XPath, and XInclude. If you are using an older
version of Java, you can still use many of these tools but you will have
to download these packages separately.<a id="I_indexterm24_id828022" class="indexterm"/><a id="I_indexterm24_id828028" class="indexterm"/></p></div><div class="sect2" title="XML and Web Browsers"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-24-SECT-1.5"/>XML and Web Browsers</h2></div></div></div><p><a id="I_indexterm24_id828040" class="indexterm"/>All modern web browsers support XML explicitly, both in
terms of simple rendering of XML content and also client-side
transformation of XML into HTML for display. If you load an XML document
in you browser it will generally be displayed as a tree with controls to
allow you to collapse and expand nodes (like an outline). Displaying XML
in this way is used mainly for debugging, but JavaScript can also
support client-side XSL transformation directly in the browser. XSL is a
language for transforming XML into other documents; we’ll talk about it
later in this chapter.</p><p>When viewed in older browsers or in contexts that do not
explicitly format XML for viewing, the browser will generally simply
display the text of the document with all the tags (structural
information) stripped off. This is the prescribed behavior for working
with unknown XML markup in a viewing environment. Remember that you can
always use the “view source” option to display the text of a file in
your browser if you want to see the original source.<a id="I_indexterm24_id828070" class="indexterm"/></p></div><div class="footnotes"><br/><hr/><div class="footnote"><p><sup>[<a id="ftn.learnjava3-CHP-24-FN-1" href="#learnjava3-CHP-24-FN-1" class="para">49</a>] </sup>To read Berners-Lee’s original proposal to CERN, go to <a class="ulink" href="http://www.w3.org/History/1989/proposal.html">http://www.w3.org/History/1989/proposal.html</a>.</p></div></div></div></body></html>