epubjs

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head><title>The URL Class</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="sect1" title="The URL Class"><div class="titlepage"><div><div><h1 class="title"><a id="learnjava3-CHP-14-SECT-2"/>The URL Class</h1></div></div></div><p>Bringing this down to a more concrete level is the Java URL class. The URL class represents a URL address and provides a simple API for accessing web resources, such as documents and applications on servers. It can use an extensible set of protocol and content handlers to perform the necessary communication and in theory even data conversion. With the URL class, an application can open a connection to a server on the network and retrieve content with just a few lines of code. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying your applications.</p><p>A URL is represented by an instance of the <code class="literal">java.net.URL</code> class. A <code class="literal">URL</code> object manages all the component information within a URL string and provides methods for retrieving the object it identifies. We can construct a <code class="literal">URL</code> object from a URL string or from its component parts:</p><a id="I_14_tt899"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code> <code class="n">URL</code> <code class="n">aDoc</code> <code class="o">=</code> <code class="k">new</code> <code class="nf">URL</code><code class="o">(</code> <code class="s">"http://foo.bar.com/documents/homepage.html"</code> <code class="o">);</code> <code class="n">URL</code> <code class="n">sameDoc</code> <code class="o">=</code> <code class="k">new</code> <code class="nf">URL</code><code class="o">(</code><code class="s">"http"</code><code class="o">,</code><code class="s">"foo.bar.com"</code><code class="o">,</code><code class="s">"documents/homepage.html"</code><code class="o">);</code> <code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">MalformedURLException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="o">...</code> <code class="o">}</code></pre><p>These two <code class="literal">URL</code> objects point to the same network resource, the <span class="emphasis"><em>homepage.html</em></span> document on the server <span class="emphasis"><em>foo.bar.com</em></span>. Whether the resource actually exists and is available isn’t known until we try to access it. When initially constructed, the <code class="literal">URL</code> object contains only data about the object’s location and how to access it. No connection to the server has been made. We can examine the various parts of the <code class="literal">URL</code> with the <a id="I_indexterm14_id773355" class="indexterm"/><code class="literal">getProtocol()</code>, <a id="I_indexterm14_id773366" class="indexterm"/><code class="literal">getHost()</code>, and <a id="I_indexterm14_id773376" class="indexterm"/><code class="literal">getFile()</code> methods. We can also compare it to another <code class="literal">URL</code> with the <a id="I_indexterm14_id773393" class="indexterm"/><code class="literal">sameFile()</code> method (which has an unfortunate name for something that may not point to a file). <code class="literal">sameFile()</code> determines whether two URLs point to the same resource. It can be fooled, but <code class="literal">sameFile()</code> does more than compare the URL strings for equality; it takes into account the possibility that one server may have several names as well as other factors. (It doesn’t go as far as to fetch the resources and compare them, however.)</p><p>When a <code class="literal">URL</code> is created, its specification is parsed to identify just the protocol component. If the protocol doesn’t make sense, or if Java can’t find a protocol handler for it, the URL constructor throws a <a id="I_indexterm14_id773431" class="indexterm"/><a id="I_indexterm14_id773436" class="indexterm"/><code class="literal">MalformedURLException</code>. A <span class="emphasis"><em>protocol handler</em></span> is a Java class that implements the communications protocol for accessing the URL resource. For example, given an <code class="literal">http</code> URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified web server.</p><p>As of Java 7, URL protocol handlers are guaranteed to be provided for <a id="I_indexterm14_id773461" class="indexterm"/><code class="literal">http</code>, <a id="I_indexterm14_id773472" class="indexterm"/><code class="literal">https</code> (secure HTTP), and <a id="I_indexterm14_id773482" class="indexterm"/><code class="literal">ftp</code>, as well as local <a id="I_indexterm14_id773493" class="indexterm"/><code class="literal">file</code> URLs and <a id="I_indexterm14_id773504" class="indexterm"/><code class="literal">jar</code> URLs that refer to files inside JAR archives. Outside of that, it gets a little dicey. We’ll talk more about the issues surrounding content and protocol handlers a bit later in this chapter.</p><div class="sect2" title="Stream Data"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.1"/>Stream Data</h2></div></div></div><p><a id="idx10815" class="indexterm"/> <a id="idx10823" class="indexterm"/> <a id="idx10834" class="indexterm"/>The lowest-level and most general way to get data back from a <code class="literal">URL</code> is to ask for an <code class="literal">InputStream</code> from the <code class="literal">URL</code> by calling <a id="I_indexterm14_id773578" class="indexterm"/><code class="literal">openStream()</code>. Getting the data as a stream may also be useful if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of the byte stream yourself. Working in this mode is basically the same as working with a byte stream from socket communications, but the URL protocol handler has already dealt with all of the server communications and is providing you with just the content portion of the transaction. Not all types of URLs support the <code class="literal">openStream()</code> method because not all types of URLs refer to concrete data; you’ll get an <a id="I_indexterm14_id773600" class="indexterm"/><code class="literal">UnknownServiceException</code> if the URL doesn’t.</p><p>The following code prints the contents of an HTML file on a web server:</p><a id="I_14_tt900"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code> <code class="n">URL</code> <code class="n">url</code> <code class="o">=</code> <code class="k">new</code> <code class="n">URL</code><code class="o">(</code><code class="s">"http://server/index.html"</code><code class="o">);</code> <code class="err"> </code> <code class="n">BufferedReader</code> <code class="n">bin</code> <code class="o">=</code> <code class="k">new</code> <code class="n">BufferedReader</code> <code class="o">(</code> <code class="k">new</code> <code class="nf">InputStreamReader</code><code class="o">(</code> <code class="n">url</code><code class="o">.</code><code class="na">openStream</code><code class="o">()</code> <code class="o">));</code> <code class="err"> </code> <code class="n">String</code> <code class="n">line</code><code class="o">;</code> <code class="k">while</code> <code class="o">(</code> <code class="o">(</code><code class="n">line</code> <code class="o">=</code> <code class="n">bin</code><code class="o">.</code><code class="na">readLine</code><code class="o">())</code> <code class="o">!=</code> <code class="kc">null</code> <code class="o">)</code> <code class="o">{</code> <code class="n">System</code><code class="o">.</code><code class="na">out</code><code class="o">.</code><code class="na">println</code><code class="o">(</code> <code class="n">line</code> <code class="o">);</code> <code class="o">}</code> <code class="n">bin</code><code class="o">.</code><code class="na">close</code><code class="o">();</code> <code class="o">}</code> <code class="k">catch</code> <code class="o">(</code><code class="n">Exception</code> <code class="n">e</code><code class="o">)</code> <code class="o">{</code> <code class="o">}</code></pre><p>We ask for an <code class="literal">InputStream</code> with <code class="literal">openStream()</code> and wrap it in a <code class="literal">BufferedReader</code> to read the lines of text. Because we specify the <code class="literal">http</code> protocol in the URL, we enlist the services of an HTTP protocol handler. Note that we haven’t talked about content handlers yet. In this case, because we’re reading directly from the input stream, no content handler (no transformation of the content data) is involved.<a id="I_indexterm14_id773657" class="indexterm"/><a id="I_indexterm14_id773664" class="indexterm"/><a id="I_indexterm14_id773672" class="indexterm"/></p></div><div class="sect2" title="Getting the Content as an Object"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.2"/>Getting the Content as an Object</h2></div></div></div><p><a id="idx10821" class="indexterm"/> <a id="idx10831" class="indexterm"/>As we said previously, reading raw content from a stream is the most general mechanism for accessing data over the Web. <code class="literal">openStream()</code> leaves the parsing of data up to you. The URL class, however, was intended to support a more sophisticated, pluggable, content-handling mechanism. We’ll discuss this now, but be aware that it is not widely used because of lack of standardization and limitations in how you can deploy new handlers. Although the Java community made some progress in recent years in standardizing a small set of protocol handlers, no such effort was made to standardize content handlers. This means that although this part of the discussion is interesting, its usefulness is limited.</p><p>The way it’s supposed to work is that when Java knows the type of content being retrieved from a URL and a proper content handler is available, you can retrieve the <code class="literal">URL</code> content as an appropriate Java object by calling the <code class="literal">URL</code>’s <a id="I_indexterm14_id773749" class="indexterm"/><code class="literal">getContent()</code> method. In this mode of operation, <code class="literal">getContent()</code> initiates a connection to the host, fetches the data for you, determines the type of data, and then invokes a content handler to turn the bytes into a Java object. It acts sort of as if you had read a serialized Java object, as in <a class="xref" href="ch13.html" title="Chapter 13. Network Programming">Chapter 13</a>. Java will try to determine the type of the content by looking at its <a id="I_indexterm14_id773773" class="indexterm"/>MIME type, its file extension, or even by examining the bytes directly.</p><p>For example, given the URL <span class="emphasis"><em>http://foo.bar.com/index.html</em></span> , a call to <code class="literal">getContent()</code> uses the HTTP protocol handler to retrieve data and might use an HTML content handler to turn the data into an appropriate document object. Similarly, a GIF file might be turned into an AWT <a id="I_indexterm14_id773800" class="indexterm"/><code class="literal">ImageProducer</code> object using a GIF content handler. If we access the GIF file using an FTP URL, Java would use the same content handler but a different protocol handler to receive the data.</p><p>Since the content handler must be able to return any type of object, the return type of <code class="literal">getContent()</code> is <code class="literal">Object</code>. This might leave us wondering what kind of object we got. In a moment, we’ll describe how we could ask the protocol handler about the object’s MIME type. Based on this, and whatever other knowledge we have about the kind of object we are expecting, we can cast the <code class="literal">Object</code> to its appropriate, more specific type. For example, if we expect an image, we might cast the result of <code class="literal">getContent()</code> to <code class="literal">ImageProducer</code>:</p><a id="I_14_tt902"/><pre class="programlisting"><code class="k">try</code> <code class="o">{</code> <code class="n">ImageProducer</code> <code class="n">ip</code> <code class="o">=</code> <code class="o">(</code><code class="n">ImageProducer</code><code class="o">)</code><code class="n">myURL</code><code class="o">.</code><code class="na">getContent</code><code class="o">();</code> <code class="o">}</code> <code class="k">catch</code> <code class="o">(</code> <code class="n">ClassCastException</code> <code class="n">e</code> <code class="o">)</code> <code class="o">{</code> <code class="o">...</code> <code class="o">}</code></pre><p>Various kinds of errors can occur when trying to retrieve the data. For example, <a id="I_indexterm14_id773860" class="indexterm"/><code class="literal">getContent()</code> can throw an <a id="I_indexterm14_id773871" class="indexterm"/><code class="literal">IOException</code> if there is a communications error. Other kinds of errors can occur at the application level: some knowledge of how the application-specific content and protocol handlers deal with errors is necessary. One problem that could arise is that a content handler for the data’s MIME type wouldn’t be available. In this case, <code class="literal">getContent()</code> invokes a special “unknown type” handler that returns the data as a raw <code class="literal">InputStream</code> (back to square one).</p><p>In some situations, we may also need knowledge of the protocol handler. For example, consider a <code class="literal">URL</code> that refers to a nonexistent file on an HTTP server. When requested, the server returns the familiar “404 Not Found” message. To deal with protocol-specific operations like this, we may need to talk to the protocol handler, which we’ll discuss next.<a id="I_indexterm14_id773912" class="indexterm"/><a id="I_indexterm14_id773919" class="indexterm"/></p></div><div class="sect2" title="Managing Connections"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.3"/>Managing Connections</h2></div></div></div><p><a id="idx10822" class="indexterm"/> <a id="idx10833" class="indexterm"/>Upon calling <a id="I_indexterm14_id773961" class="indexterm"/><code class="literal">openStream()</code> or <code class="literal">getContent()</code> on a <code class="literal">URL</code>, the protocol handler is consulted and a connection is made to the remote server or location. Connections are represented by a <a id="I_indexterm14_id773983" class="indexterm"/><code class="literal">URLConnection</code> object, subtypes of which manage different protocol-specific communications and offer additional metadata about the source. The <code class="literal">HttpURLConnection</code> class, for example, handles basic web requests and also adds some HTTP-specific capabilities such as interpreting “404 Not Found” messages and other web server errors. We’ll talk more about <code class="literal">HttpURLConnection</code> later in this chapter.</p><p>We can get a <code class="literal">URLConnection</code> from our <code class="literal">URL</code> directly with the <code class="literal">openConnection()</code> method. One of the things we can do with the <code class="literal">URLConnection</code> is ask for the object’s content type before reading data. For example:</p><a id="I_14_tt903"/><pre class="programlisting"><code class="n">URLConnection</code> <code class="n">connection</code> <code class="o">=</code> <code class="n">myURL</code><code class="o">.</code><code class="na">openConnection</code><code class="o">();</code> <code class="n">String</code> <code class="n">mimeType</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getContentType</code><code class="o">();</code> <code class="n">InputStream</code> <code class="n">in</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getInputStream</code><code class="o">();</code></pre><p>Despite its name, a <code class="literal">URLConnection</code> object is initially created in a raw, unconnected state. In this example, the network connection was not actually initiated until we called the <a id="I_indexterm14_id774055" class="indexterm"/><code class="literal">getContentType()</code> method. The <code class="literal">URLConnection</code> does not talk to the source until data is requested or its <code class="literal">connect()</code> method is explicitly invoked. Prior to connection, network parameters and protocol-specific features can be set up. For example, we can set timeouts on the initial connection to the server and on reads:</p><a id="I_14_tt904"/><pre class="programlisting"><code class="n">URLConnection</code> <code class="n">connection</code> <code class="o">=</code> <code class="n">myURL</code><code class="o">.</code><code class="na">openConnection</code><code class="o">();</code> <code class="n">connection</code><code class="o">.</code><code class="na">setConnectTimeout</code><code class="o">(</code> <code class="mi">10000</code> <code class="o">);</code> <code class="c1">// milliseconds</code> <code class="n">connection</code><code class="o">.</code><code class="na">setReadTimeout</code><code class="o">(</code> <code class="mi">10000</code> <code class="o">);</code> <code class="c1">// milliseconds</code> <code class="n">InputStream</code> <code class="n">in</code> <code class="o">=</code> <code class="n">connection</code><code class="o">.</code><code class="na">getInputStream</code><code class="o">();</code></pre><p>As we’ll see in the section “Using the POST Method,” we can get at the protocol-specific information by casting the <code class="literal">URLConnection</code> to its specific subtype.<a id="I_indexterm14_id774099" class="indexterm"/><a id="I_indexterm14_id774106" class="indexterm"/></p></div><div class="sect2" title="Handlers in Practice"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.4"/>Handlers in Practice</h2></div></div></div><p><a id="I_indexterm14_id774120" class="indexterm"/> <a id="idx10832" class="indexterm"/>The content- and protocol-handler mechanisms we’ve described are very flexible; to handle new types of URLs, you need only add the appropriate handler classes. One interesting application of this would be Java-based web browsers that could handle new and specialized kinds of URLs by downloading them over the Net. The idea for this was touted in the earliest days of Java. Unfortunately, it never came to fruition. There is no API for dynamically downloading new content and protocol handlers. In fact, there is no standard API for determining what content and protocol handlers exist on a given platform.</p><p>Java currently mandates protocol handlers for HTTP, HTTPS, FTP, FILE, and JAR. While in practice you will generally find these basic protocol handlers with all versions of Java, that’s not entirely comforting, and the story for content handlers is even less clear. The standard Java classes don’t, for example, include content handlers for HTML, GIF, JPEG, or other common data types. Furthermore, although content and protocol handlers are part of the Java API and an intrinsic part of the mechanism for working with URLs, specific content and protocol handlers aren’t defined. Even those protocol handlers that have been bundled in Java are still packaged as part of the Sun implementation classes and are not truly part of the core API for all to see.</p><p>In summary, the Java content- and protocol-handler mechanism was a forward-thinking approach that never quite materialized. The promise of web browsers that dynamically extend themselves for new types of protocols and new content is, like flying cars, always just a few years away. Although the basic mechanics of the protocol-handler mechanism are useful (especially now with some standardization) for decoding content in your own applications, you should probably turn to other, newer frameworks that have a bit more specificity.</p></div><div class="sect2" title="Useful Handler Frameworks"><div class="titlepage"><div><div><h2 class="title"><a id="learnjava3-CHP-14-SECT-2.5"/>Useful Handler Frameworks</h2></div></div></div><p><a id="idx10811" class="indexterm"/> <a id="idx10824" class="indexterm"/> <a id="idx10835" class="indexterm"/>The idea of dynamically downloadable handlers could also be applied to other kinds of handler-like components. For example, the Java XML community is fond of referring to XML as a way to apply semantics (meaning) to documents and to Java as a portable way to supply the behavior that goes along with those semantics. It’s possible that an XML viewer could be built with downloadable handlers for displaying XML tags.</p><p><a id="I_indexterm14_id774222" class="indexterm"/> <a id="I_indexterm14_id774228" class="indexterm"/>The JavaBeans APIs touch upon this subject with the Java Activation Framework (JAF), which provides a way to detect the data stream type and “encapsulate access to it” in a Java bean. If this sounds suspiciously like the content handler’s job, it is. Unfortunately, it looks like these APIs will not be merged and, outside of the Java Mail API, the JAF has not been widely used.</p><p>Fortunately, for working with URL streams of images, music, and video, very mature APIs are available. <a id="I_indexterm14_id774245" class="indexterm"/><a id="I_indexterm14_id774251" class="indexterm"/><a id="I_indexterm14_id774256" class="indexterm"/><a id="I_indexterm14_id774262" class="indexterm"/>The Java Advanced Imaging API (JAI) includes a well-defined, extensible set of handlers for most image types, and the Java Media Framework (JMF) can play most common music and video types found online.<a id="I_indexterm14_id774271" class="indexterm"/><a id="I_indexterm14_id774278" class="indexterm"/><a id="I_indexterm14_id774285" class="indexterm"/></p></div></div></body></html>