epubjs
Version:
Render ePub documents in the browser, across many devices
196 lines (195 loc) • 23.6 kB
HTML
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>The URL Class</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="sect1" title="The URL Class"><div class="titlepage"><div><div><h1 class="title"><a id="learnjava3-CHP-14-SECT-2"/>The URL Class</h1></div></div></div><p>Bringing this down to a more concrete level is the Java URL class.
The URL class represents a URL address and provides a simple API for
accessing web resources, such as documents and applications on servers. It
can use an extensible set of protocol and content handlers to perform the
necessary communication and in theory even data conversion. With the URL
class, an application can open a connection to a server on the network and
retrieve content with just a few lines of code. As new types of servers
and new formats for content evolve, additional URL handlers can be
supplied to retrieve and interpret the data without modifying your
applications.</p><p>A URL is represented by an instance of the <code class="literal">java.net.URL</code> class. A <code class="literal">URL</code> object manages all the component information
within a URL string and provides methods for retrieving the object it
identifies. We can construct a <code class="literal">URL</code>
object from a URL string or from its component parts:</p><a id="I_14_tt899"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code>
<code class="n">URL</code> <code class="n">aDoc</code> <code class="o">=</code>
<code class="k">new</code> <code class="nf">URL</code><code class="o">(</code> <code class="s">"http://foo.bar.com/documents/homepage.html"</code> <code class="o">);</code>
<code class="n">URL</code> <code class="n">sameDoc</code> <code class="o">=</code>
<code class="k">new</code> <code class="nf">URL</code><code class="o">(</code><code class="s">"http"</code><code class="o">,</code><code class="s">"foo.bar.com"</code><code class="o">,</code><code class="s">"documents/homepage.html"</code><code class="o">);</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">MalformedURLException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="o">...</code> <code class="o">}</code></pre><p>These two <code class="literal">URL</code> objects point to
the same network resource, the <span class="emphasis"><em>homepage.html</em></span> document
on the server <span class="emphasis"><em>foo.bar.com</em></span>. Whether the resource
actually exists and is available isn’t known until we try to access it.
When initially constructed, the <code class="literal">URL</code>
object contains only data about the object’s location and how to access
it. No connection to the server has been made. We can examine the various
parts of the <code class="literal">URL</code> with the <a id="I_indexterm14_id773355" class="indexterm"/><code class="literal">getProtocol()</code>, <a id="I_indexterm14_id773366" class="indexterm"/><code class="literal">getHost()</code>, and <a id="I_indexterm14_id773376" class="indexterm"/><code class="literal">getFile()</code> methods. We can
also compare it to another <code class="literal">URL</code> with the
<a id="I_indexterm14_id773393" class="indexterm"/><code class="literal">sameFile()</code> method (which
has an unfortunate name for something that may not point to a file).
<code class="literal">sameFile()</code> determines whether two URLs
point to the same resource. It can be fooled, but <code class="literal">sameFile()</code> does more than compare the URL
strings for equality; it takes into account the possibility that one
server may have several names as well as other factors. (It doesn’t go as
far as to fetch the resources and compare them, however.)</p><p>When a <code class="literal">URL</code> is created, its
specification is parsed to identify just the protocol component. If the
protocol doesn’t make sense, or if Java can’t find a protocol handler for
it, the URL constructor throws a <a id="I_indexterm14_id773431" class="indexterm"/><a id="I_indexterm14_id773436" class="indexterm"/><code class="literal">MalformedURLException</code>. A
<span class="emphasis"><em>protocol handler</em></span> is a Java class that implements the
communications protocol for accessing the URL resource. For example, given
an <code class="literal">http</code> URL, Java prepares to use the
HTTP protocol handler to retrieve documents from the specified web
server.</p><p>As of Java 7, URL protocol handlers are guaranteed to be provided
for <a id="I_indexterm14_id773461" class="indexterm"/><code class="literal">http</code>, <a id="I_indexterm14_id773472" class="indexterm"/><code class="literal">https</code> (secure HTTP), and
<a id="I_indexterm14_id773482" class="indexterm"/><code class="literal">ftp</code>, as well as local
<a id="I_indexterm14_id773493" class="indexterm"/><code class="literal">file</code> URLs and <a id="I_indexterm14_id773504" class="indexterm"/><code class="literal">jar</code> URLs that refer to
files inside JAR archives. Outside of that, it gets a little dicey. We’ll
talk more about the issues surrounding content and protocol handlers a bit
later in this chapter.</p><div class="sect2" title="Stream Data"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.1"/>Stream Data</h2></div></div></div><p><a id="idx10815" class="indexterm"/> <a id="idx10823" class="indexterm"/> <a id="idx10834" class="indexterm"/>The lowest-level and most general way to get data back
from a <code class="literal">URL</code> is to ask for an <code class="literal">InputStream</code> from the <code class="literal">URL</code> by calling <a id="I_indexterm14_id773578" class="indexterm"/><code class="literal">openStream()</code>. Getting
the data as a stream may also be useful if you want to receive
continuous updates from a dynamic information source. The drawback is
that you have to parse the contents of the byte stream yourself. Working
in this mode is basically the same as working with a byte stream from
socket communications, but the URL protocol handler has already dealt
with all of the server communications and is providing you with just the
content portion of the transaction. Not all types of URLs support the
<code class="literal">openStream()</code> method because not all
types of URLs refer to concrete data; you’ll get an <a id="I_indexterm14_id773600" class="indexterm"/><code class="literal">UnknownServiceException</code>
if the URL doesn’t.</p><p>The following code prints the contents of an HTML file on a web
server:</p><a id="I_14_tt900"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code>
<code class="n">URL</code> <code class="n">url</code> <code class="o">=</code> <code class="k">new</code> <code class="n">URL</code><code class="o">(</code><code class="s">"http://server/index.html"</code><code class="o">);</code>
<code class="err"> </code>
<code class="n">BufferedReader</code> <code class="n">bin</code> <code class="o">=</code> <code class="k">new</code> <code class="n">BufferedReader</code> <code class="o">(</code>
<code class="k">new</code> <code class="nf">InputStreamReader</code><code class="o">(</code> <code class="n">url</code><code class="o">.</code><code class="na">openStream</code><code class="o">()</code> <code class="o">));</code>
<code class="err"> </code>
<code class="n">String</code> <code class="n">line</code><code class="o">;</code>
<code class="k">while</code> <code class="o">(</code> <code class="o">(</code><code class="n">line</code> <code class="o">=</code> <code class="n">bin</code><code class="o">.</code><code class="na">readLine</code><code class="o">())</code> <code class="o">!=</code> <code class="kc">null</code> <code class="o">)</code> <code class="o">{</code>
<code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code> <code class="n">line</code> <code class="o">);</code>
<code class="o">}</code>
<code class="n">bin</code><code class="o">.</code><code class="na">close</code><code class="o">();</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code><code class="n">Exception</code> <code class="n">e</code><code class="o">)</code> <code class="o">{</code> <code class="o">}</code></pre><p>We ask for an <code class="literal">InputStream</code> with
<code class="literal">openStream()</code> and wrap it in a
<code class="literal">BufferedReader</code> to read the lines of
text. Because we specify the <code class="literal">http</code>
protocol in the URL, we enlist the services of an HTTP protocol handler.
Note that we haven’t talked about content handlers yet. In this case,
because we’re reading directly from the input stream, no content handler
(no transformation of the content data) is involved.<a id="I_indexterm14_id773657" class="indexterm"/><a id="I_indexterm14_id773664" class="indexterm"/><a id="I_indexterm14_id773672" class="indexterm"/></p></div><div class="sect2" title="Getting the Content as an Object"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.2"/>Getting the Content as an Object</h2></div></div></div><p><a id="idx10821" class="indexterm"/> <a id="idx10831" class="indexterm"/>As we said previously, reading raw content from a stream
is the most general mechanism for accessing data over the Web. <code class="literal">openStream()</code> leaves the parsing of data up to
you. The URL class, however, was intended to support a more
sophisticated, pluggable, content-handling mechanism. We’ll discuss this
now, but be aware that it is not widely used because of lack of
standardization and limitations in how you can deploy new handlers.
Although the Java community made some progress in recent years in
standardizing a small set of protocol handlers, no such effort was made
to standardize content handlers. This means that although this part of
the discussion is interesting, its usefulness is limited.</p><p>The way it’s supposed to work is that when Java knows the type of
content being retrieved from a URL and a proper content handler is
available, you can retrieve the <code class="literal">URL</code>
content as an appropriate Java object by calling the <code class="literal">URL</code>’s <a id="I_indexterm14_id773749" class="indexterm"/><code class="literal">getContent()</code> method. In
this mode of operation, <code class="literal">getContent()</code>
initiates a connection to the host, fetches the data for you, determines
the type of data, and then invokes a content handler to turn the bytes
into a Java object. It acts sort of as if you had read a serialized Java
object, as in <a class="xref" href="ch13.html" title="Chapter 13. Network Programming">Chapter 13</a>. Java will try to
determine the type of the content by looking at its <a id="I_indexterm14_id773773" class="indexterm"/>MIME type, its file extension, or even by examining the
bytes directly.</p><p>For example, given the URL
<span class="emphasis"><em>http://foo.bar.com/index.html</em></span> , a call to
<code class="literal">getContent()</code> uses the HTTP protocol
handler to retrieve data and might use an HTML content handler to turn
the data into an appropriate document object. Similarly, a GIF file
might be turned into an AWT <a id="I_indexterm14_id773800" class="indexterm"/><code class="literal">ImageProducer</code> object
using a GIF content handler. If we access the GIF file using an FTP URL,
Java would use the same content handler but a different protocol handler
to receive the data.</p><p>Since the content handler must be able to return any type of
object, the return type of <code class="literal">getContent()</code> is <code class="literal">Object</code>. This might leave us wondering what
kind of object we got. In a moment, we’ll describe how we could ask the
protocol handler about the object’s MIME type. Based on this, and
whatever other knowledge we have about the kind of object we are
expecting, we can cast the <code class="literal">Object</code> to
its appropriate, more specific type. For example, if we expect an image,
we might cast the result of <code class="literal">getContent()</code> to <code class="literal">ImageProducer</code>:</p><a id="I_14_tt902"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code>
<code class="n">ImageProducer</code> <code class="n">ip</code> <code class="o">=</code> <code class="o">(</code><code class="n">ImageProducer</code><code class="o">)</code><code class="n">myURL</code><code class="o">.</code><code class="na">getContent</code><code class="o">();</code>
<code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">ClassCastException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="o">...</code> <code class="o">}</code></pre><p>Various kinds of errors can occur when trying to retrieve the
data. For example, <a id="I_indexterm14_id773860" class="indexterm"/><code class="literal">getContent()</code> can throw
an <a id="I_indexterm14_id773871" class="indexterm"/><code class="literal">IOException</code> if there is
a communications error. Other kinds of errors can occur at the
application level: some knowledge of how the application-specific
content and protocol handlers deal with errors is necessary. One problem
that could arise is that a content handler for the data’s MIME type
wouldn’t be available. In this case, <code class="literal">getContent()</code> invokes a special “unknown type”
handler that returns the data as a raw <code class="literal">InputStream</code> (back to square one).</p><p>In some situations, we may also need knowledge of the protocol
handler. For example, consider a <code class="literal">URL</code>
that refers to a nonexistent file on an HTTP server. When requested, the
server returns the familiar “404 Not Found” message. To deal with
protocol-specific operations like this, we may need to talk to the
protocol handler, which we’ll discuss next.<a id="I_indexterm14_id773912" class="indexterm"/><a id="I_indexterm14_id773919" class="indexterm"/></p></div><div class="sect2" title="Managing Connections"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.3"/>Managing Connections</h2></div></div></div><p><a id="idx10822" class="indexterm"/> <a id="idx10833" class="indexterm"/>Upon calling <a id="I_indexterm14_id773961" class="indexterm"/><code class="literal">openStream()</code> or
<code class="literal">getContent()</code> on a <code class="literal">URL</code>, the protocol handler is consulted and a
connection is made to the remote server or location. Connections are
represented by a <a id="I_indexterm14_id773983" class="indexterm"/><code class="literal">URLConnection</code> object,
subtypes of which manage different protocol-specific communications and
offer additional metadata about the source. The <code class="literal">HttpURLConnection</code> class, for example, handles
basic web requests and also adds some HTTP-specific capabilities such as
interpreting “404 Not Found” messages and other web server errors. We’ll
talk more about <code class="literal">HttpURLConnection</code>
later in this chapter.</p><p>We can get a <code class="literal">URLConnection</code> from
our <code class="literal">URL</code> directly with the <code class="literal">openConnection()</code> method. One of the things we
can do with the <code class="literal">URLConnection</code> is ask
for the object’s content type before reading data. For example:</p><a id="I_14_tt903"/><pre class="programlisting"><code class="n">URLConnection</code> <code class="n">connection</code> <code class="o">=</code> <code class="n">myURL</code><code class="o">.</code><code class="na">openConnection</code><code class="o">();</code>
<code class="n">String</code> <code class="n">mimeType</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getContentType</code><code class="o">();</code>
<code class="n">InputStream</code> <code class="n">in</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getInputStream</code><code class="o">();</code></pre><p>Despite its name, a <code class="literal">URLConnection</code> object is initially created in a
raw, unconnected state. In this example, the network connection was not
actually initiated until we called the <a id="I_indexterm14_id774055" class="indexterm"/><code class="literal">getContentType()</code>
method. The <code class="literal">URLConnection</code> does not
talk to the source until data is requested or its <code class="literal">connect()</code> method is explicitly invoked. Prior
to connection, network parameters and protocol-specific features can be
set up. For example, we can set timeouts on the initial connection to
the server and on reads:</p><a id="I_14_tt904"/><pre class="programlisting"><code class="n">URLConnection</code> <code class="n">connection</code> <code class="o">=</code> <code class="n">myURL</code><code class="o">.</code><code class="na">openConnection</code><code class="o">();</code>
<code class="n">connection</code><code class="o">.</code><code class="na">setConnectTimeout</code><code class="o">(</code> <code class="mi">10000</code> <code class="o">);</code> <code class="c1">// milliseconds</code>
<code class="n">connection</code><code class="o">.</code><code class="na">setReadTimeout</code><code class="o">(</code> <code class="mi">10000</code> <code class="o">);</code> <code class="c1">// milliseconds</code>
<code class="n">InputStream</code> <code class="n">in</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getInputStream</code><code class="o">();</code></pre><p>As we’ll see in the section “Using the POST Method,” we can get at
the protocol-specific information by casting the <code class="literal">URLConnection</code> to its specific
subtype.<a id="I_indexterm14_id774099" class="indexterm"/><a id="I_indexterm14_id774106" class="indexterm"/></p></div><div class="sect2" title="Handlers in Practice"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.4"/>Handlers in Practice</h2></div></div></div><p><a id="I_indexterm14_id774120" class="indexterm"/> <a id="idx10832" class="indexterm"/>The content- and protocol-handler mechanisms we’ve
described are very flexible; to handle new types of URLs, you need only
add the appropriate handler classes. One interesting application of this
would be Java-based web browsers that could handle new and specialized
kinds of URLs by downloading them over the Net. The idea for this was
touted in the earliest days of Java. Unfortunately, it never came to
fruition. There is no API for dynamically downloading new content and
protocol handlers. In fact, there is no standard API for determining
what content and protocol handlers exist on a given platform.</p><p>Java currently mandates protocol handlers for HTTP, HTTPS, FTP,
FILE, and JAR. While in practice you will generally find these basic
protocol handlers with all versions of Java, that’s not entirely
comforting, and the story for content handlers is even less clear. The
standard Java classes don’t, for example, include content handlers for
HTML, GIF, JPEG, or other common data types. Furthermore, although
content and protocol handlers are part of the Java API and an intrinsic
part of the mechanism for working with URLs, specific content and
protocol handlers aren’t defined. Even those protocol handlers that have
been bundled in Java are still packaged as part of the Sun
implementation classes and are not truly part of the core API for all to
see.</p><p>In summary, the Java content- and protocol-handler mechanism was a
forward-thinking approach that never quite materialized. The promise of
web browsers that dynamically extend themselves for new types of
protocols and new content is, like flying cars, always just a few years
away. Although the basic mechanics of the protocol-handler mechanism are
useful (especially now with some standardization) for decoding content
in your own applications, you should probably turn to other, newer
frameworks that have a bit more specificity.</p></div><div class="sect2" title="Useful Handler Frameworks"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.5"/>Useful Handler Frameworks</h2></div></div></div><p><a id="idx10811" class="indexterm"/> <a id="idx10824" class="indexterm"/> <a id="idx10835" class="indexterm"/>The idea of dynamically downloadable handlers could also
be applied to other kinds of handler-like components. For example, the
Java XML community is fond of referring to XML as a way to apply
semantics (meaning) to documents and to Java as a portable way to supply
the behavior that goes along with those semantics. It’s possible that an
XML viewer could be built with downloadable handlers for displaying XML
tags.</p><p><a id="I_indexterm14_id774222" class="indexterm"/> <a id="I_indexterm14_id774228" class="indexterm"/>The JavaBeans APIs touch upon this subject with the Java
Activation Framework (JAF), which provides a way to detect the data
stream type and “encapsulate access to it” in a Java bean. If this
sounds suspiciously like the content handler’s job, it is.
Unfortunately, it looks like these APIs will not be merged and, outside
of the Java Mail API, the JAF has not been widely used.</p><p>Fortunately, for working with URL streams of images, music, and
video, very mature APIs are available. <a id="I_indexterm14_id774245" class="indexterm"/><a id="I_indexterm14_id774251" class="indexterm"/><a id="I_indexterm14_id774256" class="indexterm"/><a id="I_indexterm14_id774262" class="indexterm"/>The Java Advanced Imaging API (JAI) includes a
well-defined, extensible set of handlers for most image types, and the
Java Media Framework (JMF) can play most common music and video types
found online.<a id="I_indexterm14_id774271" class="indexterm"/><a id="I_indexterm14_id774278" class="indexterm"/><a id="I_indexterm14_id774285" class="indexterm"/></p></div></div></body></html>