epubjs
Version:
Render ePub documents in the browser, across many devices
578 lines (553 loc) • 61 kB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Chapter 3. Building Robust Node Applications</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body><div class="chapter" title="Chapter 3. Building Robust Node Applications"><div class="titlepage"><div><div><h1 class="title"><a id="chap_3"/>Chapter 3. Building Robust Node Applications</h1></div></div></div><p><a id="chap3_id35941339"/>To make the most of the server-side JavaScript
environment, it’s important to understand some core concepts behind the
design choices that were made for <a id="no3.0" class="indexterm"/>Node.js and JavaScript in general. Understanding the decisions
and trade-offs will make it easier for you to write great code and architect
your systems. It will also help you explain to other people why Node.js is
different from other systems they’ve used and where the performance gains
come from. No engineer likes unknowns in her system. “Magic” is not an
acceptable answer, so it helps to be able to explain why a particular
architecture is beneficial and under what circumstances.</p><p>This chapter will cover the coding styles, design patterns, and
production know-how you need to write good, robust Node code.</p><div class="sect1" title="The Event Loop"><div class="titlepage"><div><div><h1 class="title"><a id="chap3_id35941348"/>The Event Loop</h1></div></div></div><p><a id="chap3_id35941353"/>A fundamental part of Node is the <a id="evl3.1" class="indexterm"/><a id="we3.1" class="indexterm"/><span class="emphasis"><em>event loop</em></span>, a concept underlying the
behavior of JavaScript as well as most other interactive systems. In many
languages, event models are bolted onto the side, but JavaScript events
have always been a core part of the language. This is because JavaScript
has always dealt with user interaction. Anyone who has used a modern web
browser is accustomed to web pages that do things “onclick,”
“onmouseover,” etc. These events are so common that we hardly think about
them when writing web page interaction, but having this event support in
the language is incredibly powerful. On the server, instead of the limited
set of events based on the user-driven interaction with the web page’s
DOM, we have an infinite variety of events based on what’s happening in
the server software we use. For example, the HTTP server module provides
an event called “request,” emitted when a user sends the web server a
request.</p><p><a id="chap3_id35941356"/>The event loop is the system that JavaScript
uses to deal with these incoming requests from various parts of the system
in a sane manner. There are a number of ways people deal with “real-time”
or “parallel” issues in computing. Most of them are fairly complex and,
frankly, make our brains hurt. JavaScript takes a simple approach that
makes the process much more understandable, but it does introduce a few
constraints. By having a grasp of how the event loop works, you’ll be able
to use it to its full advantage and avoid the pitfalls of this
approach.</p><p><a id="chap3_id35941364"/>Node takes the approach that all <a id="I_indexterm3_d1e2844" class="indexterm"/><a id="I_indexterm3_d1e2849" class="indexterm"/>I/O activities should be nonblocking (for reasons we’ll
explain more later). This means that HTTP requests, database queries, file
I/O, and other things that require the program to wait do not halt
execution until they return data. Instead, they run independently, and
then emit an event when their data is available. This means that
programming in Node.js has lots of callbacks dealing with all kinds of
I/O. Callbacks often <a id="ca3.1" class="indexterm"/><a id="evlc3.1" class="indexterm"/><a id="evp3.1" class="indexterm"/>initiate other callbacks in a cascading fashion, which is
very different from browser programming. There is still a certain amount
of linear setup, but the bulk of the code involves dealing with
callbacks.</p><p><a id="chap3_id35941392"/>Because of this somewhat unfamiliar
programming style, we need to look for <a id="I_indexterm3_d1e2873" class="indexterm"/><a id="I_indexterm3_d1e2878" class="indexterm"/>patterns to help us effectively program on the server. That
starts with the event loop. We think that most people intuitively
understand event-driven programming because it is like everyday life.
Imagine you are cooking. You are chopping a bell pepper and a pot starts
to boil over (<a class="xref" href="ch03.html#chap3_id35941374" title="Figure 3-1. Event-driven people">Figure 3-1</a>). You finish the slice
you are working on, and then turn down the stove. Rather than trying to
chop and turn down the stove at the same time, you achieve the same result
in a much safer manner by rapidly switching contexts. <a id="I_indexterm3_d1e2886" class="indexterm"/>Event-driven programming does the same thing. By allowing
the programmer to write code that only ever works on one callback at a
time, the program is both understandable and also able to quickly perform
many tasks efficiently.</p><div class="figure-float"><div class="figure"><a id="chap3_id35941374"/><div class="figure-contents"><div class="mediaobject"><a id="I_mediaobject3_d1e2895"/><img src="httpatomoreillycomsourceoreillyimages1137977.png" alt="Event-driven people"/></div></div><p class="title">Figure 3-1. Event-driven people</p></div></div><p><a id="chap3_id35959079"/>In everyday life, we are used to having all
sorts of internal callbacks for dealing with events, and yet, like
JavaScript, we always do just one thing at once. Yes, yes, we can see that
you are rubbing your tummy and patting your head at the same time—well
done. But if you try to do any serious activities at the same time, it
goes wrong pretty quickly. This is like JavaScript. It’s great at letting
events drive the action, but it’s “single-threaded” so that only one thing
happens at once.</p><p><a id="chap3_id35941395"/>This <a id="I_indexterm3_d1e2907" class="indexterm"/><a id="I_indexterm3_d1e2912" class="indexterm"/>single-threaded concept is really important. One of the
criticisms leveled at Node.js fairly often is its lack of “concurrency.”
That is, it doesn’t use all of the CPUs on a machine to run the
JavaScript. The problem with running code on multiple CPUs at once is that
it requires coordination between multiple “threads” of execution. In order
for multiple CPUs to effectively split up work, they would have to talk to
each other about the current state of the program, what work they’d each
done, etc. Although this is possible, it’s a more complex model that
requires more effort from both the programmer and the system. JavaScript’s
approach is simple: there is only one thing happening at once. Everything
that Node does is nonblocking, so the time between an event being emitted
and Node being able to act on that event is very short because it’s not
waiting on things such as disk I/O.</p><p><a id="chap3_id35941417"/>Another way to think about the <a id="I_indexterm3_d1e2920" class="indexterm"/>event loop is to compare it to a postman (or mailman). To
our event-loop postman, each letter is an event. He has a stack of events
to deliver in order. For each letter (event) the postman gets, he walks to
the route to deliver the letter (<a class="xref" href="ch03.html#chap3_id35941398" title="Figure 3-2. The event-loop postman">Figure 3-2</a>). The
route is the callback function assigned to that event (sometimes more than
one). Critically, however, because our postman has only a single set of
legs, he can walk only a single code path at a time.</p><div class="figure-float"><div class="figure"><a id="chap3_id35941398"/><div class="figure-contents"><div class="mediaobject"><a id="I_mediaobject3_d1e2931"/><img src="httpatomoreillycomsourceoreillyimages1137979.png" alt="The event-loop postman"/></div></div><p class="title">Figure 3-2. The event-loop postman</p></div></div><p><a id="chap3_id35959139"/>Sometimes, while the postman is walking a code
route, someone will give him another letter. This is the callback function
he is visiting at the moment. In this case, the postman delivers the new
message immediately (after all, someone gave it to him directly instead of
going via the post office, so it must be urgent). The postman will diverge
from his current code path and walk the proper code path to deliver the
new event. He then carries on walking the original code path emitted by
the previous event.</p><p><a id="chap3_id35941428"/>Let’s look at the behavior of our postman in a
typical program by picking something simple. Suppose we have a web (HTTP)
server that gets requests, retrieves some data from a database, and
returns it to the user. In this scenario, we have a few events to deal
with. First (as in most cases) comes the<a id="I_indexterm3_d1e2941" class="indexterm"/> <code class="literal">request</code> event from the
user asking the web server for a web page. The callback that deals with
the initial request (let’s call it callback A) looks at the request object
and figures out what data it needs from the database. It then makes a
request to the database for that data, passing another function, callback
B, to be called on the<a id="I_indexterm3_d1e2948" class="indexterm"/> <code class="literal">response</code> event. Having
handled the <code class="literal">request</code>, callback A
returns. When the database has found the data, it issues the <code class="literal">response</code> event. The event loop then calls
callback B, which sends the data back to the user.</p><p><a id="chap3_id35941453"/>This seems fairly straightforward. The obvious
thing to note here is the “break” in the code, which you wouldn’t get in a
procedural system. Because Node.js is a nonblocking system, when we get to
the database call that would make us wait, we instead issue a callback.
This means that different functions must start handling the request and
finish handling it when the data is ready to return. So we need to make
sure that we either pass any state we need to the callback or make it
available in some other way. JavaScript programming typically does it
through closures. We’ll discuss that in more detail later.</p><p>Why does this make Node more efficient? Imagine ordering food at a
fast food restaurant. When you get in line at the counter, the server
taking your order can behave in two ways. One of them is event-driven, and
one of them isn’t. Let’s start with the typical approach taken by PHP and
many other web platforms. When you ask the server for your order, he takes
it but won’t serve any other customers until he has completed your order.
There are a few things he can do after he’s typed in your order: process
your payment, pour your drink, and so on. However, the server is still
going to have to wait an unknown amount of time for the kitchen to make
your burger (one of us is vegetarian, and orders always seem to take
ages). If, as in the traditional approach of web application frameworks,
each server (thread) is allocated to just one request at a time, the only
way to scale up is to add more threads. However, it’s also very obvious
that our server isn’t being very efficient. He’s spending a lot of time
waiting for the kitchen to cook the food.</p><p>Obviously, real-life restaurants use a much more efficient model.
When a server has finished taking your order, you receive a number that he
can use to call you back. You could say this is a callback number. This is
how Node works. When slow things such as I/O start, Node simply gives them a callback
reference and then gets on with other work that is ready now, like the
next customer (or event, in Node’s case). It’s important to note that as
we saw in the example of the postman, at no time do restaurant servers
ever deal with two customers at the same time. When they are calling
someone back to collect an order, they are not taking a new one, and vice
versa. By acting in an event-driven way, the servers are able to maximize
their throughput.</p><p>This analogy also illustrates the cases where Node fits well and
those where it doesn’t. In a small restaurant where the kitchen staff and
the wait staff are the same people, no improvement can be made by becoming
event-driven. Because all the work is being done by the same people,
event-driven architectures don’t add anything. If all (or most) of the
work your server does is computation, Node might not be the ideal
model.</p><p>However, we can also see when the architecture fits. Imagine there
are two servers and four customers in a restaurant (<a class="xref" href="ch03.html#chap3_fig2a" title="Figure 3-3. Fast food, fast code">Figure 3-3</a>). If the servers serve only one customer at a
time, the first two customers will get the fastest possible order, but the
third and fourth customers will get a terrible experience. The first two
customers will get their food as soon as it is ready because the servers
have dedicated their whole attention to fulfilling their orders. That
comes at the cost of the other two customers. In an event-driven model,
the first two customers might have to wait a short amount of time for the
servers to finish taking the orders of the third and fourth customers
before they get their food, but the average wait time (latency) of the
system will be much, much lower.</p><div class="figure-float"><div class="figure"><a id="chap3_fig2a"/><div class="figure-contents"><div class="mediaobject"><a id="I_mediaobject3_d1e2980"/><img src="httpatomoreillycomsourceoreillyimages1137981.png" alt="Fast food, fast code"/></div></div><p class="title">Figure 3-3. Fast food, fast code</p></div></div><p><a id="chap3_id35941480"/>Let’s look at another example. We’ve given the
event-loop postman a letter to deliver that requires a gate to be opened.
He gets there and the gate is closed, so he simply waits and tries again
and again. He’s trapped in an endless loop waiting for the gate to open
(<a class="xref" href="ch03.html#chap3_id35941461" title="Figure 3-4. Blocking the event loop">Figure 3-4</a>). Perhaps there is a letter on the
stack that will ask someone to open the gate so the postman can get
through. Surely that will solve things, right? Unfortunately, this will
only help if the postman gets to deliver the letter, and currently he’s
stuck waiting endlessly for the gate to open. This is because the event
that opens the gate is external to the current event callback. If we emit
the event from within a callback, we already know our postman will go and
deliver that letter before carrying on, but when events are emitted
outside the currently executing piece of code, they will not be called
until that piece of code has been fully evaluated to its
conclusion.</p><div class="figure"><a id="chap3_id35941461"/><div class="figure-contents"><div class="mediaobject"><a id="I_mediaobject3_d1e2992"/><img src="httpatomoreillycomsourceoreillyimages1137983.png" alt="Blocking the event loop"/></div></div><p class="title">Figure 3-4. Blocking the event loop</p></div><p><a id="chap3_id35958800"/>As an illustration, the code in <a class="xref" href="ch03.html#chap3_id35941483" title="Example 3-1. Event-loop blocking code">Example 3-1</a> creates a loop that Node.js (or a browser)
will never break out of.</p><div class="example"><a id="chap3_id35941483"/><p class="title">Example 3-1. Event-loop blocking code</p><div class="example-contents"><a id="chap3_id35941488"/><pre class="programlisting">EE = require('events').EventEmitter;
ee = new EE();
die = false;
ee.on('die', function() {
die = true;
});
setTimeout(function() {
ee.emit('die');
}, 100);
while(!die) {
}
console.log('done');</pre></div></div><p><a id="chap3_id35941499"/>In this example, <code class="literal">console.log</code> will never be called, because the
<code class="literal">while</code> loop stops Node from ever getting a chance to
call back the timeout and emit <a id="I_indexterm3_d1e3015" class="indexterm"/>the <code class="literal">die</code> event. Although
it’s unlikely we’d program a loop like this that relies on an external
condition to exit, it illustrates how Node.js can do only one thing at
once, and getting a fly in the ointment can really screw up the whole
server. This is why <a id="I_indexterm3_d1e3022" class="indexterm"/><a id="I_indexterm3_d1e3027" class="indexterm"/>nonblocking I/O is an essential part of event-driven
programming.</p><p>Let’s consider some numbers. When we run an operation in the CPU
(not a line of JavaScript, but a single machine code operation), it takes
about one-third of a nanosecond (ns). A 3Ghz processor runs
3×10<sup>9</sup> instructions a second, so each
instruction takes 10<sup>-9</sup>/3 seconds each. There
are typically two types of memory in a CPU, L1 and L2 cache, each of which
takes approximately 2–5ns to access. If we get data from memory (RAM), it
takes about 80ns, which is about two orders of magnitude slower than
running an instruction. However, all of these things are in the same
ballpark. Getting things from slower forms of I/O is not quite so good.
Imagine that getting data from RAM is equivalent to the weight of a cat.
Retrieving data from the hard drive, then, could be considered to be the
weight of a whale. Getting things from the network is like 100 whales.
Think about how running <code class="literal">var foo = "bar"</code>
versus a database query is a single cat versus 100 blue whales. Blocking
I/O doesn’t put an actual gate in front of the event-loop postman, but it
does send him via Timbuktu when he is delivering his events.</p><p><a id="chap3_id35958838"/>Given a basic understanding of the event loop,
let’s look at the standard Node.js code for creating an <a id="I_indexterm3_d1e3046" class="indexterm"/><a id="I_indexterm3_d1e3051" class="indexterm"/>HTTP server, shown in <a class="xref" href="ch03.html#chap3_id35941509" title="Example 3-2. A basic HTTP server">Example 3-2</a>.</p><div class="example"><a id="chap3_id35941509"/><p class="title">Example 3-2. A basic HTTP server</p><div class="example-contents"><a id="chap3_id35941515"/><pre class="programlisting">var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
console.log('Server running at http://127.0.0.1:8124/');</pre></div></div><p><a id="chap3_id35941525"/>This code is the most basic example from the
Node.js website (but as we’ll see soon, it’s not the ideal way to code).
The example creates an HTTP server using a factory method in the <code class="literal">http</code> library. The factory method creates a new
HTTP server and attaches a callback to the <code class="literal">request</code> event. The callback is specified as the
argument to <a id="I_indexterm3_d1e3072" class="indexterm"/>the <code class="literal">createServer</code> method.
What’s interesting here is what happens when this code is run. The first
thing Node.js does is run the code in the example from top to bottom. This
can be considered the “setup” phase of Node programming. Because we
attached some event listeners, Node.js doesn’t exit, but instead waits for
an event to be fired. If we didn’t attach any events, Node.js would exit
as soon as it had run the code.</p><p><a id="chap3_id35941544"/>So what happens when the server gets an HTTP
request? Node.js emits the <code class="literal">request</code>
event, which causes the callbacks attached to that event to be run in
order. In this case, there is only one callback, the anonymous function we
passed as an argument to <code class="literal">createServer</code>.
Let’s assume it’s the first request the server has had since setup.
Because there is no other code running, the <code class="literal">request</code> event is handled immediately and the
callback is run. It’s a very simple callback, and it runs pretty
fast.</p><p><a id="chap3_id35958903"/>Let’s assume that our site gets really popular
and we get lots of requests. If, for the sake of argument, our callback
takes 1 second and we get a second request shortly after the first one,
the second request isn’t going to be acted on for another second or so.
Obviously, a second is a really long time, and as we look at the
requirements of real-world applications, the problem of blocking the event
loop becomes more damaging to the user experience. The operating system
kernel actually handles the TCP connections to clients for the HTTP
server, so there isn’t a risk of rejecting new connections, but there is a
real danger of not acting on them. The upshot of this is that we want to
keep Node.js as event-driven and nonblocking as possible. In the same way
that a slow I/O event should use callbacks to indicate the presence of
data that Node.js can act on, the Node.js program itself should be written
in such a way that no single callback ties up the event loop for extended
periods of time.</p><p><a id="chap3_id35941557"/>This means that you should follow two
strategies when writing a Node.js server:</p><div class="itemizedlist"><a id="chap3_id35941561"/><ul class="itemizedlist"><li class="listitem"><p><a id="chap3_id35941563"/><a id="chap3_id35941564"/>Once setup has been completed, make all
actions event-driven.</p></li><li class="listitem"><p><a id="chap3_id35941568"/><a id="chap3_id35941569"/>If Node.js is required to process
something that will take a long time, consider delegating it to web
workers.</p></li></ul></div><p><a id="chap3_id35941574"/>Taking the event-driven approach works
effectively with the event loop (the name is a hint that it would), but
it’s also important to write event-driven code in a way that is easy to
read and understand. In the previous example, we used an anonymous
function as the event callback, which makes things hard in a couple of
ways. First, we have no control over where the code is used. An anonymous
function’s call stack starts from when it is used, rather than when the
callback is attached to an event. This affects debugging. If everything is
an anonymous event, it can be hard to distinguish similar callbacks when
an exception <a id="I_indexterm3_d1e3105" class="indexterm"/><a id="I_indexterm3_d1e3107" class="indexterm"/><a id="I_indexterm3_d1e3109" class="indexterm"/><a id="I_indexterm3_d1e3111" class="indexterm"/><a id="I_indexterm3_d1e3113" class="indexterm"/>occurs.</p></div><div class="sect1" title="Patterns"><div class="titlepage"><div><div><h1 class="title"><a id="I_sect13_d1e3116"/>Patterns</h1></div></div></div><p>Event-driven <a id="we3.2" class="indexterm"/><a id="ev3.2" class="indexterm"/>programming is different from procedural programming. The
easiest way to learn it is to practice routine patterns that have been
discovered by previous generations of programmers. That is the purpose of
this section.</p><p>Before we launch into patterns, we’ll take a look at what is really
happening behind various programming styles to give the patterns some
context. Most of this section will focus on I/O, because, as discussed in
the previous section, event-driven programming is focused on solving
problems with I/O. When it is working with data in memory that doesn’t
require I/O, Node can be completely procedural.</p><div class="sect2" title="The I/O Problem Space"><div class="titlepage"><div><div><h2 class="title"><a id="id825979"/>The I/O Problem Space</h2></div></div></div><p>We’ll start by <a id="pa3.2.1" class="indexterm"/><a id="I_indexterm3_d1e3144" class="indexterm"/>looking at the types of I/O required in efficient systems.
These will be the basis of our patterns.</p><p>The first obvious distinction to look at is serial versus parallel
I/O. Serial is obvious: do <span class="emphasis"><em>this</em></span> I/O, and after it
is finished, do <span class="emphasis"><em>that</em></span> I/O. Parallel is more
complicated to implement but also easy to understand: do
<span class="emphasis"><em>this</em></span> I/O and <span class="emphasis"><em>that</em></span> I/O at the
same time. The important point here is that ordering is normally
considered implicit in serial tasks, but parallel tasks could return in
any order.</p><p>Groups of serial and <a id="I_indexterm3_d1e3166" class="indexterm"/><a id="I_indexterm3_d1e3171" class="indexterm"/>parallel work can also be combined. For example, two
groups of parallel requests could execute serially: do
<span class="emphasis"><em>this</em></span> and <span class="emphasis"><em>that</em></span> together, then
do <span class="emphasis"><em>other</em></span> and <span class="emphasis"><em>another</em></span>
together.</p><p>In Node, we assume that all<a id="I_indexterm3_d1e3191" class="indexterm"/><a id="I_indexterm3_d1e3196" class="indexterm"/> I/O has unbounded latency. This means that any I/O tasks
could take from 0 to infinite time. We don’t know, and can’t assume, how
long these tasks take. So instead of waiting for them, we use
placeholders (events), which then fire callbacks when the I/O happens.
Because we have assumed unbounded latency, it’s easy to perform parallel
tasks. You simply make a number of calls for various I/O tasks. They
will return whenever they are ready, in whatever order that happens to
be. Ordered serial requests are also easy to make by nesting or
referencing callbacks together so that the first callback will initiate
the second I/O request, the second callback will initiate the third, and
so on. Even though each request is asynchronous and <a id="I_indexterm3_d1e3202" class="indexterm"/>doesn’t block the event loop, the requests are made in
serial. This pattern of ordered requests is useful when the results of
one I/O operation have to inform the details of the next I/O request.</p><p>So far, we have two ways to do I/O: ordered serial requests and
unordered parallel requests. Ordered parallel requests are also a useful
pattern; they happen when we allow the I/O to take place in parallel,
but we deal with the results in a particular sequence. Unordered serial
I/O offers no particular benefits, so we won’t consider it as a
pattern.</p><div class="sect3" title="Unordered parallel I/O"><div class="titlepage"><div><div><h3 class="title"><a id="id826118"/>Unordered parallel I/O</h3></div></div></div><p>Let’s start <a id="I_indexterm3_d1e3218" class="indexterm"/><a id="I_indexterm3_d1e3223" class="indexterm"/><a id="I_indexterm3_d1e3226" class="indexterm"/>with unordered parallel I/O (<a class="xref" href="ch03.html#example3-3" title="Example 3-3. Unordered parallel I/O in Node">Example 3-3</a>) because it’s by far the easiest to do in
Node. In fact, all I/O in Node is unordered parallel by default. This
is because all I/O in Node is asynchronous and nonblocking. When we do
any I/O, we simply throw the request out there and see what happens.
It’s possible that all the requests will happen in the order we made
them, but maybe they won’t. When we talk about unordered, we don’t
mean randomized, but simply that there is no guaranteed order.</p><div class="example"><a id="example3-3"/><p class="title">Example 3-3. Unordered parallel I/O in Node</p><div class="example-contents"><pre class="programlisting">fs.readFile('foo.txt', 'utf8', function(err, data) {
console.log(data);
};
fs.readFile('bar.txt', 'utf8', function(err, data) {
console.log(data);
};</pre></div></div><p>Simply making I/O requests with <a id="I_indexterm3_d1e3241" class="indexterm"/>callbacks will create unordered parallel I/O. At some
point in the future, both of these callbacks will fire. Which happens
first is unknown, and either one could return an error rather than
data without affecting the other request.</p></div><div class="sect3" title="Ordered serial I/O"><div class="titlepage"><div><div><h3 class="title"><a id="id826186"/>Ordered serial I/O</h3></div></div></div><p>In this pattern, <a id="io3.2.1.2" class="indexterm"/><a id="or3.2.1.2" class="indexterm"/><a id="se3.2.1.2" class="indexterm"/>we want to do some I/O (unbounded latency) tasks in
sequence. Each previous task must be completed before the next task is
started. In Node, this means nesting <a id="I_indexterm3_d1e3266" class="indexterm"/><a id="I_indexterm3_d1e3271" class="indexterm"/>callbacks so that the callback from each task starts the
next task, as shown in <a class="xref" href="ch03.html#example3-4" title="Example 3-4. Nesting callbacks to produce serial requests">Example 3-4</a>.</p><div class="example"><a id="example3-4"/><p class="title">Example 3-4. Nesting callbacks to produce serial requests</p><div class="example-contents"><pre class="programlisting">server.on('request', function(req, res) {
//get session information from memcached
memcached.getSession(req, function(session) {
//get information from db
db.get(session.user, function(userData) {
//some other web service call
ws.get(req, function(wsData) {
//render page
page = pageRender(req, session, userData, wsData);
//output the response
res.write(page);
});
});
});
});</pre></div></div><p>Although nesting callbacks allows easy creation of ordered
serial I/O, it also creates so-called<a id="I_indexterm3_d1e3284" class="indexterm"/><a id="I_indexterm3_d1e3287" class="indexterm"/> “pyramid” code.<sup>[<a id="id826280" href="#ftn.id826280" class="footnote">6</a>]</sup> This code can be hard to read and understand, and as a
consequence, hard to maintain. For instance, a glance at <a class="xref" href="ch03.html#example3-4" title="Example 3-4. Nesting callbacks to produce serial requests">Example 3-4</a> doesn’t reveal that the completion of the
<code class="literal">memcached.getSession</code> request
launches the <code class="literal">db.get</code> request, that
the completion of the <code class="literal">db.get</code>
request launches the <code class="literal">ws.get</code>
request, and so on. There are a few ways to make this code more
readable without breaking the fundamental ordered serial
pattern.</p><p>First, we can continue to use inline function declarations, but
we can name them, as in <a class="xref" href="ch03.html#example3-5" title="Example 3-5. Naming function calls in callbacks">Example 3-5</a>. This makes
debugging a lot easier as well as giving an indication of what the
callback is going to do.</p><div class="example"><a id="example3-5"/><p class="title">Example 3-5. Naming function calls in callbacks</p><div class="example-contents"><pre class="programlisting">server.on('request', getMemCached(req, res) {
memcached.getSession(req, getDbInfo(session) {
db.get(session.user, getWsInfo(userData) {
ws.get(req, render(wsData) {
//render page
page = pageRender(req, session, userData, wsData);
//output the response
res.write(page);
});
});
});
});</pre></div></div><p>Another approach that changes the style of code is to use
declared functions instead of just anonymous or named ones. This
removes the natural pyramid seen in the other approaches, which shows
the order of execution, but it also breaks the code out into more
manageable chunks (see <a class="xref" href="ch03.html#example3-6" title="Example 3-6. Using declared functions to separate out code">Example 3-6</a>).</p><div class="example"><a id="example3-6"/><p class="title">Example 3-6. Using declared functions to separate out code</p><div class="example-contents"><pre class="programlisting">var render = function(wsData) {
page = pageRender(req, session, userData, wsData);
};
var getWsInfo = function(userData) {
ws.get(req, render);
};
var getDbInfo = function(session) {
db.get(session.user, getWsInfo);
};
var getMemCached = function(req, res) {
memcached.getSession(req, getDbInfo);
};</pre></div></div><p>The code shown in this example won’t actually work. The original
nested code used closures to encapsulate some variables and make them
available to subsequent functions. Hence, declared functions can be
good when state doesn’t need to be maintained across three or more
callbacks. If you need only the information from the last callback in
order to do the next one, it works well. It can be a lot more readable
(especially with documentation) than a huge lump of nested
functions.</p><p>There are, of course, ways of passing data around between
functions. Mostly it comes down to using the features of the
JavaScript language itself. JavaScript has functional scope,
<a id="I_indexterm3_d1e3332" class="indexterm"/><a id="I_indexterm3_d1e3337" class="indexterm"/>which means that when you declare <code class="literal">var</code> within a <a id="I_indexterm3_d1e3344" class="indexterm"/>function, the variable becomes local to that function.
However, simply having <code class="literal">{</code> and
<code class="literal">}</code> does not limit the scope of a
variable. This allows us to define variables in the outer callback
that can be accessed by the inner callbacks even when the outer
callbacks have “closed” by returning. When we nest callbacks, we are
implicitly binding the variables from all the previous callbacks into
the most recently defined callback. It just turns out that lots of
nesting isn’t very easy to work with.</p><p>We can still perform the flattening refactoring we did, but we
should do it within the shared scope of the original request, to form
a closure environment around all the callbacks we want to do. This
way, all the callbacks relating to that initial request can be
encapsulated and can share state via variables in the encapsulating
callback (<a class="xref" href="ch03.html#example3-7" title="Example 3-7. Encapsulating within a callback">Example 3-7</a>).</p><div class="example"><a id="example3-7"/><p class="title">Example 3-7. Encapsulating within a callback</p><div class="example-contents"><pre class="programlisting"> server.on('request', function(req, res) {
var render = function(wsData) {
page = pageRender(req, session, userData, wsData);
};
var getWsInfo = function(userData) {
ws.get(req, render);
};
var getDbInfo = function(session) {
db.get(session.user, getWsInfo);
};
var getMemCached = function(req, res) {
memcached.getSession(req, getDbInfo);
};
}</pre></div></div><p>Not only does this approach organize code in a logical way, but
it also allows you to flatten a lot of the callback hell.</p><p>Other organizational innovations are also possible. Sometimes
there is code you want to reuse across many functions. This is the
province of <em class="firstterm">middleware</em>. There are many ways to
do <a id="I_indexterm3_d1e3373" class="indexterm"/>middleware. One of the most popular in Node is the model
used by the <a id="I_indexterm3_d1e3379" class="indexterm"/>Connect framework, which could be said to be based on
Rack from the Ruby world. The general idea behind its implementation
is that we pass around some variables that represent not only the
state but also the methods of interacting with that state.</p><p>In JavaScript, objects are
<a id="I_indexterm3_d1e3386" class="indexterm"/><a id="I_indexterm3_d1e3391" class="indexterm"/><a id="I_indexterm3_d1e3394" class="indexterm"/>passed by reference. That means when you call <code class="literal">my</code><code class="literal">Function(someObject)</code>, any changes you
make to <code class="literal">someObject</code> will affect all
copies of <code class="literal">someObject</code> in your
current functional scope. This is potentially tricky, but gives you
some great powers if you are careful about any side effects created.
Side effects are largely dangerous in <a id="I_indexterm3_d1e3409" class="indexterm"/>asynchronous code. When something modifies an object
used by a <a id="I_indexterm3_d1e3415" class="indexterm"/>callback, it can often be very difficult to figure out
when that change happened because it happens in a nonlinear order. If
you use the ability to change objects passed by arguments, be
considerate of where those objects are going to be used.</p><p>The basic idea is to take something that represents the state
and pass it between all functions that need to act on that state. This
means that all the things acting on the state need to have the same
interface so they can pass between themselves. This is why Connect
(and therefore Express) middleware all takes the form <code class="literal">function(req, res, next)</code>. We discuss
Connect/Express middleware in more detail in <a class="xref" href="ch07.html" title="Chapter 7. Important External Modules">Chapter 7</a>.</p><p>In the meantime, let’s look at the basic approach, shown in
<a class="xref" href="ch03.html#example3-8" title="Example 3-8. Passing changes between functions">Example 3-8</a>. When we share objects between
<a id="I_indexterm3_d1e3432" class="indexterm"/>functions, earlier functions in the call stack can
affect the state of those objects such that the later objects utilize
the <a id="I_indexterm3_d1e3438" class="indexterm"/><a id="I_indexterm3_d1e3440" class="indexterm"/><a id="I_indexterm3_d1e3442" class="indexterm"/><a id="I_indexterm3_d1e3444" class="indexterm"/><a id="I_indexterm3_d1e3446" class="indexterm"/><a id="I_indexterm3_d1e3448" class="indexterm"/>changes.</p><div class="example"><a id="example3-8"/><p class="title">Example 3-8. Passing changes between functions</p><div class="example-contents"><pre class="programlisting"> var AwesomeClass = function() {
this.awesomeProp = 'awesome!'
this.awesomeFunc = function(text) {
console.log(text + ' is awesome!')
}
}
var awesomeObject = new AwesomeClass()
function middleware(func) {
oldFunc = func.awesomeFunc
func.awesomeFunc = function(text) {
text = text + ' really'
oldFunc(text)
}
}
function anotherMiddleware(func) {
func.anotherProp = 'super duper'
}
function caller(input) {
input.awesomeFunc(input.anotherProp)
}
middleware(awesomeObject)
anotherMiddleware(awesomeObject)
caller(awesomeObject)</pre></div></div></div></div></div><div class="sect1" title="Writing Code for Production"><div class="titlepage"><div><div><h1 class="title"><a id="I_sect13_d1e3456"/>Writing Code for Production</h1></div></div></div><p>One of the challenges <a id="we3.3" class="indexterm"/><a id="I_indexterm3_d1e3466" class="indexterm"/><a id="I_indexterm3_d1e3471" class="indexterm"/>of writing a book is trying to explain things in the
simplest way possible. That runs counter to showing techniques and
functional code that you’d want to deploy. Although we should always
strive to have the simplest, most understandable code possible, sometimes
you need to do things that make code more robust or faster at the cost of
making it less simple. This section provides guidance about how to harden
the applications you deploy, which you can take with you as you explore
upcoming chapters. This section is about writing code with maturity that
will keep your application running long into the future. It’s not
exhaustive, but if you write robust code, you won’t have to deal with so
many maintenance issues. One of the trade-offs of Node’s single-threaded
approach is a tendency to be brittle. These techniques help mitigate this
risk.</p><p>Deploying a production application is not the same as running test
programs on your laptop. Servers can have a wide variety of resource
constraints, but they tend to have a lot more resources than the typical
machine you would develop on. Typically, frontend servers have many more
cores (CPUs) than laptop or desktop machines, but less hard drive space.
They also have a lot of RAM. Node currently has some constraints, such as
a maximum <a id="I_indexterm3_d1e3479" class="indexterm"/>JavaScript heap size. This affects the way you deploy
because you want to maximize the use of the CPUs and memory on the machine
while using Node’s easy-to-program <a id="I_indexterm3_d1e3485" class="indexterm"/>single-threaded approach.</p><div class="sect2" title="Error Handling"><div class="titlepage"><div><div><h2 class="title"><a id="id826742"/>Error Handling</h2></div></div></div><p>As we saw earlier in this <a id="I_indexterm3_d1e3496" class="indexterm"/><a id="I_indexterm3_d1e3501" class="indexterm"/>chapter, you can split <a id="I_indexterm3_d1e3505" class="indexterm"/>I/O activities from other things in Node, and error
handling is one of those things. JavaScript includes<a id="I_indexterm3_d1e3511" class="indexterm"/><a id="I_indexterm3_d1e3516" class="indexterm"/> try/catch functionality, but it’s appropriate only for
errors that happen inline. When you do <a id="I_indexterm3_d1e3520" class="indexterm"/>nonblocking I/O in Node, you pass a callback to the
function. This means the callback is going to run when the event happens
outside of the try/catch block. We need to be able to provide error
handling that works in <a id="I_indexterm3_d1e3527" class="indexterm"/>asynchronous situations. Consider the code in <a class="xref" href="ch03.html#example3-9" title="Example 3-9. Trying to catch an error in a callback and failing">Example 3-9</a>.</p><div class="example"><a id="example3-9"/><p class="title">Example 3-9. Trying to catch an error in a callback and failing</p><div class="example-contents"><pre class="programlisting">var http = require('http')
var opts = {
host: 'sfnsdkfjdsnk.com',
port: 80,
path: '/'
}
try {
http.get(opts, function(res) {
console.log('Will this get called?')
})
}
catch (e) {
console.log('Will we catch an error?')
}</pre></div></div><p>When you call <code class="literal">http.get()</code>,
<a id="I_indexterm3_d1e3545" class="indexterm"/>what is actually happening? We pass some parameters
specifying the I/O we want to happen and a callback function. When the
I/O completes, the callback function will be fired. However, the
<code class="literal">http.get()</code> call will succeed simply
by issuing the callback. An error during the GET cannot be caught by a
try/catch block.</p><p>The disconnect from I/O errors is even more obvious in Node REPL.
Because the REPL shell prints out any return values that are not
assigned, we can see that the return value of calling <code class="literal">http.get()</code> is the <code class="literal">http.ClientRequest</code> object that is created.
This means that the try/catch did its job by making sure the specified
code returned without errors. However, because the hostname is nonsense,
a problem will occur within this I/O request. This means the callback
can’t be completed successfully. A try/catch can’t help with this,
because the error has happened outside the JavaScript, and when Node is
ready to report it, we are not in that call stack any more. We’ve moved
on to dealing with another event.</p><p>We deal with this in Node by using the<a id="I_indexterm3_d1e3564" class="indexterm"/> <code class="literal">error</code> event. This is a
special event that is fired when an error occurs. It allows a module
engaging in I/O to fire an alternative event to the one the callback was
listening for to deal with the error. The error event allows us to deal
with any errors that might occur in any of the callbacks that happen in
any modules we use. Let’s write the previous example correctly, as shown
in <a class="xref" href="ch03.html#example3-10" title="Example 3-10. Catching an I/O error with the error event">Example 3-10</a>.</p><div class="example"><a id="example3-10"/><p class="title">Example 3-10. Catching an I/O error with the error event</p><div class="example-contents"><pre class="programlisting">var http = require('http')
var opts = {
host: 'dskjvnfskcsjsdkcds.net',
port: 80,
path: '/'
}
var req = http.get(opts, function(res) {
console.log('This will never get called')
})
req.on('error', function(e) {
console.log('Got that pesky error trapped')
})</pre></div></div><p>By using the <code class="literal">error</code> event, we
got to deal with the error (in this case by ignoring it). However, our
program survived, which is the main thing. Like try/catch in JavaScript,
the <code class="literal">error</code> event catches all kinds of
exceptions. A good general approach to exception handling is to set up
conditionals to check for known error conditions and deal with them if
possible. Otherwise, catching any remaining errors, logging them, and
keeping your server running is probably the best approach.</p></div><div class="sect2" title="Using Multiple Processors"><div class="titlepage"><div><div><h2 class="title"><a id="id826935"/>Using Multiple Processors</h2></div></div></div><p>As we’ve mentioned, <a id="wr3.3.2" class="indexterm"/><a id="si3.3.2" class="indexterm"/><a id="mu3.3.2" class="indexterm"/>Node is single-threaded. This means Node is using only one
processor to do its work. However, most servers have several “multicore”
processors, and a single multicore processor has many processors. A
server with two physical CPU sockets might have “24 logical cores”—that
is, 24 processors exposed to the operating system. To make the best use
of Node, we should use those too. So if we don’t have threads, how do we
do that?</p><p>Node provides a module called <code class="literal">cluster</code> that <a id="cl3.3.2" class="indexterm"/>allows you to delegate work to child processes. This means
that Node creates a copy of its current program in another process (on
Windows, it is actually another thread). Each child process has some
special abilities, such as the ability to share a socket with other
children. This allows us to write Node programs that start many other
Node programs and then delegate work to them.</p><p>It is important to understand that when you use <code class="literal">cluster</code> to share work between a number of
copies of a Node program, the <a id="ma3.3.2" class="indexterm"/>master process isn’t involved in every transaction. The
master process manages the <a id="ch3.3.2" class="indexterm"/>child processes, but when the children interact with I/O
they do it directly, not through the master. This means that if you set
up a web server using <code class="literal">cluster</code>,
requests don’t go through your master process, but directly to the
children. Hence, dispatching requests does not create a bottleneck in
the system.</p><p>By using the <code class="literal">cluster</code> API, you
can <a id="wo3.3.2" class="indexterm"/><a id="di3.3.2" class="indexterm"/>distribute work to a Node process on every available core
of your server. This makes the best use of the resource. Let’s look at a
simple <code class="literal">cluster</code>
script in <a class="xref" href="ch03.html#example3-11" title="Example 3-11. Using cluster to distribute work">Example 3-11</a>.</p><div class="example"><a id="example3-11"/><p class="title">Example 3-11. Using cluster to distribute work</p><div class="example-contents"><pre class="programlisting">var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}</pre></div></div><p>In this example, we use a few parts of Node core to evenly
distribute the work across all of the CPUs available: the <code class="literal">cluster</code> module, the <code class="literal">http</code> module, <a id="I_indexterm3_d1e3662" class="indexterm"/>and <a id="I_indexterm3_d1e3668" class="indexterm"/>the <code class="literal">os</code> module.