UNPKG

epubjs

Version:

Render ePub documents in the browser, across many devices

603 lines (300 loc) 117 kB
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html><html xmlns:epub="http://www.idpf.org/2007/ops" xmlns="http://www.w3.org/1999/xhtml"><head><title>A Crash Course in Python</title><link rel="stylesheet" type="text/css" href="epub.css"/></head><body data-type="book"><section data-type="chapter" epub:type="chapter" data-pdf-bookmark="Chapter 1. A Crash Course in Python"><div class="chapter" id="python"> <h1><span class="label">Chapter 1. </span>A Crash Course in Python</h1> <blockquote data-type="epigraph" epub:type="epigraph"> <p>People are still crazy about Python after twenty-five years, which I find hard to believe.</p> <p data-type="attribution">Michael Palin</p> </blockquote> <p>All new employees <a data-type="indexterm" data-primary="Python" id="ix_Python"/>at DataSciencester are required to go through new employee orientation, the most interesting part of which is a crash course in Python.</p> <p>This is not a comprehensive Python tutorial but instead is intended to highlight the parts of the language that will be most important to us (some of which are often not the focus of Python tutorials).</p> <section data-type="sect1" data-pdf-bookmark="The Basics"><div class="sect1" id="idm2314784"> <h1>The Basics</h1> <section data-type="sect2" data-pdf-bookmark="Getting Python"><div class="sect2" id="idm685280"> <h2>Getting Python</h2> <p>You can download Python from <a href="https://www.python.org/">python.org</a>. But if you don’t already have Python, I recommend instead installing the <a href="https://store.continuum.io/cshop/anaconda/">Anaconda</a> distribution, <a data-type="indexterm" data-primary="Anaconda distribution of Python" id="idp852096"/>which already includes most of the libraries that you need to do data science.</p> <p>As I write this, the latest version of Python is 3.4. At DataSciencester, however, we use old, reliable Python 2.7. Python 3 is not backward-compatible with Python 2, and many important libraries only work well with 2.7. The data science community is still firmly stuck on 2.7, which means we will be, too. Make sure to get that version.</p> <p>If you don’t get Anaconda, make sure to install <a href="https://pypi.python.org/pypi/pip">pip</a>, which is a Python package manager <a data-type="indexterm" data-primary="pip (Python package manager)" id="idp1065488"/>that allows you to easily install third-party packages (some of which we’ll need). <a data-type="indexterm" data-primary="IPython" id="idp1066368"/> It’s also worth getting <a href="http://ipython.org/">IPython</a>, which is a much nicer Python shell to work with.</p> <p>(If you installed Anaconda then it should have come with pip and IPython.)</p> <p>Just run:</p> <pre data-type="programlisting" data-code-language="dosbatch">pip install ipython</pre> <p>and then search the Internet for solutions to whatever cryptic error messages that causes.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="The Zen of Python"><div class="sect2" id="idp2589376"> <h2>The Zen of Python</h2> <p>Python has a somewhat Zen <a href="http://legacy.python.org/dev/peps/pep-0020/">description of its design principles</a>, which you can also find inside the Python interpreter itself by typing <code>import this</code>.</p> <p>One of the most discussed of these is:</p> <blockquote> <p>There should be one—and preferably only one—obvious way to do it.</p></blockquote> <p>Code written in accordance with this “obvious” way (which may not be obvious at all to a newcomer) is often described as “Pythonic.” Although this is not a book about Python, we will occasionally contrast Pythonic and non-Pythonic ways of accomplishing the same things, and we will generally favor Pythonic solutions to our problems.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Whitespace Formatting"><div class="sect2" id="idp1090528"> <h2>Whitespace Formatting</h2> <p>Many languages use curly braces to delimit blocks of code. <a data-type="indexterm" data-primary="Python" data-secondary="whitespace formatting" id="idp1024848"/><a data-type="indexterm" data-primary="whitespace in Python code" id="idp1026144"/> Python uses indentation:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">]:</code> <code class="k">print</code> <code class="n">i</code> <code class="c"># first line in "for i" block</code> <code class="k">for</code> <code class="n">j</code> <code class="ow">in</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">]:</code> <code class="k">print</code> <code class="n">j</code> <code class="c"># first line in "for j" block</code> <code class="k">print</code> <code class="n">i</code> <code class="o">+</code> <code class="n">j</code> <code class="c"># last line in "for j" block</code> <code class="k">print</code> <code class="n">i</code> <code class="c"># last line in "for i" block</code> <code class="k">print</code> <code class="s">"done looping"</code></pre> <p>This makes Python code very readable, but it also means that you have to be very careful with your formatting. Whitespace is ignored inside parentheses and brackets, which can be helpful for long-winded computations:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">long_winded_computation</code> <code class="o">=</code> <code class="p">(</code><code class="mi">1</code> <code class="o">+</code> <code class="mi">2</code> <code class="o">+</code> <code class="mi">3</code> <code class="o">+</code> <code class="mi">4</code> <code class="o">+</code> <code class="mi">5</code> <code class="o">+</code> <code class="mi">6</code> <code class="o">+</code> <code class="mi">7</code> <code class="o">+</code> <code class="mi">8</code> <code class="o">+</code> <code class="mi">9</code> <code class="o">+</code> <code class="mi">10</code> <code class="o">+</code> <code class="mi">11</code> <code class="o">+</code> <code class="mi">12</code> <code class="o">+</code> <code class="mi">13</code> <code class="o">+</code> <code class="mi">14</code> <code class="o">+</code> <code class="mi">15</code> <code class="o">+</code> <code class="mi">16</code> <code class="o">+</code> <code class="mi">17</code> <code class="o">+</code> <code class="mi">18</code> <code class="o">+</code> <code class="mi">19</code> <code class="o">+</code> <code class="mi">20</code><code class="p">)</code></pre> <p>and for making code easier to read:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">list_of_lists</code> <code class="o">=</code> <code class="p">[[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">],</code> <code class="p">[</code><code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">,</code> <code class="mi">6</code><code class="p">],</code> <code class="p">[</code><code class="mi">7</code><code class="p">,</code> <code class="mi">8</code><code class="p">,</code> <code class="mi">9</code><code class="p">]]</code> <code class="n">easier_to_read_list_of_lists</code> <code class="o">=</code> <code class="p">[</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">],</code> <code class="p">[</code><code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">,</code> <code class="mi">6</code><code class="p">],</code> <code class="p">[</code><code class="mi">7</code><code class="p">,</code> <code class="mi">8</code><code class="p">,</code> <code class="mi">9</code><code class="p">]</code> <code class="p">]</code></pre> <p>You can also use a backslash to indicate that a statement continues onto the next line, although we’ll rarely do this:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">two_plus_three</code> <code class="o">=</code> <code class="mi">2</code> <code class="o">+</code> \ <code class="mi">3</code></pre> <p>One consequence of whitespace formatting is that it can be hard to copy and paste code into the Python shell. For example, if you tried to paste the code:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">]:</code> <code class="c"># notice the blank line</code> <code class="k">print</code> <code class="n">i</code></pre> <p>into the ordinary Python shell, you would get a:</p> <pre data-type="programlisting">IndentationError: expected an indented block</pre> <p>because the interpreter thinks the blank line signals the end of the <code>for</code> loop’s block.</p> <p>IPython has a magic function <code>%paste</code>, which correctly pastes whatever is on your clipboard, whitespace and all. This alone is a good reason to use IPython.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Modules"><div class="sect2" id="idp1091264"> <h2>Modules</h2> <p>Certain features of Python are not loaded by default. <a data-type="indexterm" data-primary="modules (Python)" id="idp3611088"/> These include both features included as part of the language as well as third-party features that you download yourself. In order to use these features, you’ll need to <code>import</code> the modules that contain them.</p> <p>One approach is to simply import the module itself:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">import</code> <code class="nn">re</code> <code class="n">my_regex</code> <code class="o">=</code> <code class="n">re</code><code class="o">.</code><code class="n">compile</code><code class="p">(</code><code class="s">"[0-9]+"</code><code class="p">,</code> <code class="n">re</code><code class="o">.</code><code class="n">I</code><code class="p">)</code></pre> <p>Here <code>re</code> is the module containing functions and constants for working with regular expressions. After this type of <code>import</code> you can only access those functions by prefixing them with <code>re.</code>.</p> <p>If you already had a different <code>re</code> in your code you could use an alias:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">import</code> <code class="nn">re</code> <code class="kn">as</code> <code class="nn">regex</code> <code class="n">my_regex</code> <code class="o">=</code> <code class="n">regex</code><code class="o">.</code><code class="n">compile</code><code class="p">(</code><code class="s">"[0-9]+"</code><code class="p">,</code> <code class="n">regex</code><code class="o">.</code><code class="n">I</code><code class="p">)</code></pre> <p>You might also do this if your module has an unwieldy name or if you’re going to be typing it a lot. For example, when visualizing data with <code>matplotlib</code>, a standard convention is:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">import</code> <code class="nn">matplotlib.pyplot</code> <code class="kn">as</code> <code class="nn">plt</code></pre> <p>If you need a few specific values from a module, you can import them explicitly and use them without qualification:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">from</code> <code class="nn">collections</code> <code class="kn">import</code> <code class="n">defaultdict</code><code class="p">,</code> <code class="n">Counter</code> <code class="n">lookup</code> <code class="o">=</code> <code class="n">defaultdict</code><code class="p">(</code><code class="nb">int</code><code class="p">)</code> <code class="n">my_counter</code> <code class="o">=</code> <code class="n">Counter</code><code class="p">()</code></pre> <p>If you were a bad person, you could import the entire contents of a module into your namespace, which might inadvertently overwrite variables you’ve already defined:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">match</code> <code class="o">=</code> <code class="mi">10</code> <code class="kn">from</code> <code class="nn">re</code> <code class="kn">import</code> <code class="o">*</code> <code class="c"># uh oh, re has a match function</code> <code class="k">print</code> <code class="n">match</code> <code class="c"># "&lt;function re.match&gt;"</code></pre> <p>However, since you are not a bad person, you won’t ever do this.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Arithmetic"><div class="sect2" id="idp3609328"> <h2>Arithmetic</h2> <p>Python 2.7 uses integer division by default,<a data-type="indexterm" data-primary="Python" data-secondary="arithmetic" id="idp3762480"/><a data-type="indexterm" data-primary="arithmetic" data-secondary="in Python" id="idp3763456"/> so that <code>5 / 2</code> equals <code>2</code>. Almost always this is not what we want, so we will always start our files with:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">from</code> <code class="nn">__future__</code> <code class="kn">import</code> <code class="n">division</code></pre> <p>after which <code>5 / 2</code> equals <code>2.5</code>. Every code example in this book uses this new-style division. In the handful of cases where we need integer division, we can get it with a double slash: <code>5 // 2</code>.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Functions"><div class="sect2" id="idp3939056"> <h2>Functions</h2> <p>A function is a rule for taking zero or more inputs and returning a corresponding output.<a data-type="indexterm" data-primary="functions (Python)" id="idp3940928"/><a data-type="indexterm" data-primary="Python" data-secondary="functions" id="idp3980800"/> In Python, we typically define functions using <code>def</code>:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">def</code> <code class="nf">double</code><code class="p">(</code><code class="n">x</code><code class="p">):</code> <code class="sd">"""this is where you put an optional docstring</code> <code class="sd"> that explains what the function does.</code> <code class="sd"> for example, this function multiplies its input by 2"""</code> <code class="k">return</code> <code class="n">x</code> <code class="o">*</code> <code class="mi">2</code></pre> <p>Python functions are <em>first-class</em>, which means that we can assign them to variables and pass them into functions just like any other arguments:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">def</code> <code class="nf">apply_to_one</code><code class="p">(</code><code class="n">f</code><code class="p">):</code> <code class="sd">"""calls the function f with 1 as its argument"""</code> <code class="k">return</code> <code class="n">f</code><code class="p">(</code><code class="mi">1</code><code class="p">)</code> <code class="n">my_double</code> <code class="o">=</code> <code class="n">double</code> <code class="c"># refers to the previously defined function</code> <code class="n">x</code> <code class="o">=</code> <code class="n">apply_to_one</code><code class="p">(</code><code class="n">my_double</code><code class="p">)</code> <code class="c"># equals 2</code></pre> <p>It is also easy to create short anonymous functions, or lambdas:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">y</code> <code class="o">=</code> <code class="n">apply_to_one</code><code class="p">(</code><code class="k">lambda</code> <code class="n">x</code><code class="p">:</code> <code class="n">x</code> <code class="o">+</code> <code class="mi">4</code><code class="p">)</code> <code class="c"># equals 5</code></pre> <p>You can assign lambdas to variables, although most people will tell you that you should just use <code>def</code> instead:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">another_double</code> <code class="o">=</code> <code class="k">lambda</code> <code class="n">x</code><code class="p">:</code> <code class="mi">2</code> <code class="o">*</code> <code class="n">x</code> <code class="c"># don't do this</code> <code class="k">def</code> <code class="nf">another_double</code><code class="p">(</code><code class="n">x</code><code class="p">):</code> <code class="k">return</code> <code class="mi">2</code> <code class="o">*</code> <code class="n">x</code> <code class="c"># do this instead</code></pre> <p>Function parameters can also be given default arguments, which only need to be specified when you want a value other than the default:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">def</code> <code class="nf">my_print</code><code class="p">(</code><code class="n">message</code><code class="o">=</code><code class="s">"my default message"</code><code class="p">):</code> <code class="k">print</code> <code class="n">message</code> <code class="n">my_print</code><code class="p">(</code><code class="s">"hello"</code><code class="p">)</code> <code class="c"># prints 'hello'</code> <code class="n">my_print</code><code class="p">()</code> <code class="c"># prints 'my default message'</code></pre> <p>It is sometimes useful to specify arguments by name:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">def</code> <code class="nf">subtract</code><code class="p">(</code><code class="n">a</code><code class="o">=</code><code class="mi">0</code><code class="p">,</code> <code class="n">b</code><code class="o">=</code><code class="mi">0</code><code class="p">):</code> <code class="k">return</code> <code class="n">a</code> <code class="o">-</code> <code class="n">b</code> <code class="n">subtract</code><code class="p">(</code><code class="mi">10</code><code class="p">,</code> <code class="mi">5</code><code class="p">)</code> <code class="c"># returns 5</code> <code class="n">subtract</code><code class="p">(</code><code class="mi">0</code><code class="p">,</code> <code class="mi">5</code><code class="p">)</code> <code class="c"># returns -5</code> <code class="n">subtract</code><code class="p">(</code><code class="n">b</code><code class="o">=</code><code class="mi">5</code><code class="p">)</code> <code class="c"># same as previous</code></pre> <p>We will be creating many, many functions.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Strings"><div class="sect2" id="idp4245792"> <h2>Strings</h2> <p>Strings can be delimited by single<a data-type="indexterm" data-primary="Python" data-secondary="strings" id="idp4156416"/><a data-type="indexterm" data-primary="strings (in Python)" id="idp4157392"/> or double quotation marks (but the quotes have to match):</p> <pre data-type="programlisting" data-code-language="py"><code class="n">single_quoted_string</code> <code class="o">=</code> <code class="s">'data science'</code> <code class="n">double_quoted_string</code> <code class="o">=</code> <code class="s">"data science"</code></pre> <p>Python uses backslashes to encode special characters. For example:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">tab_string</code> <code class="o">=</code> <code class="s">"</code><code class="se">\t</code><code class="s">"</code> <code class="c"># represents the tab character</code> <code class="nb">len</code><code class="p">(</code><code class="n">tab_string</code><code class="p">)</code> <code class="c"># is 1</code></pre> <p>If you want backslashes as backslashes (which you might in Windows directory names or in regular expressions), you can create <em>raw</em> strings using <code>r""</code>:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">not_tab_string</code> <code class="o">=</code> <code class="s">r"\t"</code> <code class="c"># represents the characters '\' and 't'</code> <code class="nb">len</code><code class="p">(</code><code class="n">not_tab_string</code><code class="p">)</code> <code class="c"># is 2</code></pre> <p>You can create multiline strings using triple-[double-]-quotes:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">multi_line_string</code> <code class="o">=</code> <code class="s">"""This is the first line.</code> <code class="s">and this is the second line</code> <code class="s">and this is the third line"""</code></pre> </div></section> <section data-type="sect2" data-pdf-bookmark="Exceptions"><div class="sect2" id="idp4439152"> <h2>Exceptions</h2> <p>When something goes wrong, Python raises an <em>exception</em>. <a data-type="indexterm" data-primary="Python" data-secondary="exceptions" id="idp4431568"/><a data-type="indexterm" data-primary="exceptions in Python" id="idp4432576"/> Unhandled, these will cause your program to crash. You can handle them using <code>try</code> and <code>except</code>:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">try</code><code class="p">:</code> <code class="k">print</code> <code class="mi">0</code> <code class="o">/</code> <code class="mi">0</code> <code class="k">except</code> <code class="ne">ZeroDivisionError</code><code class="p">:</code> <code class="k">print</code> <code class="s">"cannot divide by zero"</code></pre> <p>Although in many languages exceptions are considered bad, in Python there is no shame in using them to make your code cleaner, and we will occasionally do so.</p> </div></section> <section data-type="sect2" data-pdf-bookmark="Lists"><div class="sect2" id="idp4383296"> <h2>Lists</h2> <p>Probably the most fundamental data structure in Python is the <code>list</code>.<a data-type="indexterm" data-primary="Python" data-secondary="lists" id="idp4410272"/><a data-type="indexterm" data-primary="lists (in Python)" id="idp4411168"/> A list is simply an ordered collection. (It is similar to what in other languages might be called an array, but with some added functionality.)</p> <pre data-type="programlisting" data-code-language="py"><code class="n">integer_list</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="n">heterogeneous_list</code> <code class="o">=</code> <code class="p">[</code><code class="s">"string"</code><code class="p">,</code> <code class="mf">0.1</code><code class="p">,</code> <code class="bp">True</code><code class="p">]</code> <code class="n">list_of_lists</code> <code class="o">=</code> <code class="p">[</code> <code class="n">integer_list</code><code class="p">,</code> <code class="n">heterogeneous_list</code><code class="p">,</code> <code class="p">[]</code> <code class="p">]</code> <code class="n">list_length</code> <code class="o">=</code> <code class="nb">len</code><code class="p">(</code><code class="n">integer_list</code><code class="p">)</code> <code class="c"># equals 3</code> <code class="n">list_sum</code> <code class="o">=</code> <code class="nb">sum</code><code class="p">(</code><code class="n">integer_list</code><code class="p">)</code> <code class="c"># equals 6</code></pre> <p>You can get or set the <em>n</em>th element of a list<a data-type="indexterm" data-primary="square brackets ([]), working with lists in Python" id="idp4322848"/> with square brackets:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code> <code class="o">=</code> <code class="nb">range</code><code class="p">(</code><code class="mi">10</code><code class="p">)</code> <code class="c"># is the list [0, 1, ..., 9]</code> <code class="n">zero</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="mi">0</code><code class="p">]</code> <code class="c"># equals 0, lists are 0-indexed</code> <code class="n">one</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="mi">1</code><code class="p">]</code> <code class="c"># equals 1</code> <code class="n">nine</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="o">-</code><code class="mi">1</code><code class="p">]</code> <code class="c"># equals 9, 'Pythonic' for last element</code> <code class="n">eight</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="o">-</code><code class="mi">2</code><code class="p">]</code> <code class="c"># equals 8, 'Pythonic' for next-to-last element</code> <code class="n">x</code><code class="p">[</code><code class="mi">0</code><code class="p">]</code> <code class="o">=</code> <code class="o">-</code><code class="mi">1</code> <code class="c"># now x is [-1, 1, 2, 3, ..., 9]</code></pre> <p>You can also use square brackets to “slice” lists:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">first_three</code> <code class="o">=</code> <code class="n">x</code><code class="p">[:</code><code class="mi">3</code><code class="p">]</code> <code class="c"># [-1, 1, 2]</code> <code class="n">three_to_end</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="mi">3</code><code class="p">:]</code> <code class="c"># [3, 4, ..., 9]</code> <code class="n">one_to_four</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="mi">1</code><code class="p">:</code><code class="mi">5</code><code class="p">]</code> <code class="c"># [1, 2, 3, 4]</code> <code class="n">last_three</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="o">-</code><code class="mi">3</code><code class="p">:]</code> <code class="c"># [7, 8, 9]</code> <code class="n">without_first_and_last</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="mi">1</code><code class="p">:</code><code class="o">-</code><code class="mi">1</code><code class="p">]</code> <code class="c"># [1, 2, ..., 8]</code> <code class="n">copy_of_x</code> <code class="o">=</code> <code class="n">x</code><code class="p">[:]</code> <code class="c"># [-1, 1, 2, ..., 9]</code></pre> <p>Python has an <code>in</code> operator to chec<a data-type="indexterm" data-primary="in operator (Python)" id="idp4656272"/>k for list membership:</p> <pre data-type="programlisting" data-code-language="py"><code class="mi">1</code> <code class="ow">in</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="c"># True</code> <code class="mi">0</code> <code class="ow">in</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="c"># False</code></pre> <p>This check involves examining the elements of the list one at a time, which means that you probably shouldn’t use it unless you know your list is pretty small (or unless you don’t care how long the check takes).</p> <p>It is easy to concatenate lists together:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="n">x</code><code class="o">.</code><code class="n">extend</code><code class="p">([</code><code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">,</code> <code class="mi">6</code><code class="p">])</code> <code class="c"># x is now [1,2,3,4,5,6]</code></pre> <p>If you don’t want to modify <code>x</code> you can use list addition:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="n">y</code> <code class="o">=</code> <code class="n">x</code> <code class="o">+</code> <code class="p">[</code><code class="mi">4</code><code class="p">,</code> <code class="mi">5</code><code class="p">,</code> <code class="mi">6</code><code class="p">]</code> <code class="c"># y is [1, 2, 3, 4, 5, 6]; x is unchanged</code></pre> <p>More frequently we will append to lists one item at a time:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">]</code> <code class="n">x</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="mi">0</code><code class="p">)</code> <code class="c"># x is now [1, 2, 3, 0]</code> <code class="n">y</code> <code class="o">=</code> <code class="n">x</code><code class="p">[</code><code class="o">-</code><code class="mi">1</code><code class="p">]</code> <code class="c"># equals 0</code> <code class="n">z</code> <code class="o">=</code> <code class="nb">len</code><code class="p">(</code><code class="n">x</code><code class="p">)</code> <code class="c"># equals 4</code></pre> <p>It is often convenient to <em>unpack</em> lists if you know how many elements they contain:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code><code class="p">,</code> <code class="n">y</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">]</code> <code class="c"># now x is 1, y is 2</code></pre> <p>although you will get a <code>ValueError</code> if you don’t have the same numbers of elements on both sides.</p> <p>It’s common to use an underscore for a value you’re going to throw away:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">_</code><code class="p">,</code> <code class="n">y</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">]</code> <code class="c"># now y == 2, didn't care about the first element</code></pre> </div></section> <section data-type="sect2" data-pdf-bookmark="Tuples"><div class="sect2" id="idp4408880"> <h2>Tuples</h2> <p>Tuples are lists’ immutable cousins.<a data-type="indexterm" data-primary="tuples (Python)" id="idp4715728"/><a data-type="indexterm" data-primary="Python" data-secondary="tuples" id="idp4963792"/> Pretty much anything you can do to a list that doesn’t involve modifying it, you can do to a tuple. You specify a tuple by using parentheses (or nothing) instead of square brackets:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">my_list</code> <code class="o">=</code> <code class="p">[</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">]</code> <code class="n">my_tuple</code> <code class="o">=</code> <code class="p">(</code><code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">)</code> <code class="n">other_tuple</code> <code class="o">=</code> <code class="mi">3</code><code class="p">,</code> <code class="mi">4</code> <code class="n">my_list</code><code class="p">[</code><code class="mi">1</code><code class="p">]</code> <code class="o">=</code> <code class="mi">3</code> <code class="c"># my_list is now [1, 3]</code> <code class="k">try</code><code class="p">:</code> <code class="n">my_tuple</code><code class="p">[</code><code class="mi">1</code><code class="p">]</code> <code class="o">=</code> <code class="mi">3</code> <code class="k">except</code> <code class="ne">TypeError</code><code class="p">:</code> <code class="k">print</code> <code class="s">"cannot modify a tuple"</code></pre> <p>Tuples are a convenient way to return multiple values from functions:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">def</code> <code class="nf">sum_and_product</code><code class="p">(</code><code class="n">x</code><code class="p">,</code> <code class="n">y</code><code class="p">):</code> <code class="k">return</code> <code class="p">(</code><code class="n">x</code> <code class="o">+</code> <code class="n">y</code><code class="p">),(</code><code class="n">x</code> <code class="o">*</code> <code class="n">y</code><code class="p">)</code> <code class="n">sp</code> <code class="o">=</code> <code class="n">sum_and_product</code><code class="p">(</code><code class="mi">2</code><code class="p">,</code> <code class="mi">3</code><code class="p">)</code> <code class="c"># equals (5, 6)</code> <code class="n">s</code><code class="p">,</code> <code class="n">p</code> <code class="o">=</code> <code class="n">sum_and_product</code><code class="p">(</code><code class="mi">5</code><code class="p">,</code> <code class="mi">10</code><code class="p">)</code> <code class="c"># s is 15, p is 50</code></pre> <p>Tuples (and lists) can also<a data-type="indexterm" data-primary="multiple assignment (Python)" id="idp5042416"/><a data-type="indexterm" data-primary="assignment, multiple, in Python" id="idp5134800"/> be used for <em>multiple assignment</em>:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">x</code><code class="p">,</code> <code class="n">y</code> <code class="o">=</code> <code class="mi">1</code><code class="p">,</code> <code class="mi">2</code> <code class="c"># now x is 1, y is 2</code> <code class="n">x</code><code class="p">,</code> <code class="n">y</code> <code class="o">=</code> <code class="n">y</code><code class="p">,</code> <code class="n">x</code> <code class="c"># Pythonic way to swap variables; now x is 2, y is 1</code></pre> </div></section> <section data-type="sect2" data-pdf-bookmark="Dictionaries"><div class="sect2" id="idp5137200"> <h2>Dictionaries</h2> <p>Another fundamental data structure is a dictionary, which<a data-type="indexterm" data-primary="Python" data-secondary="dictionaries" id="ix_Pythondict"/><a data-type="indexterm" data-primary="dictionaries (Python)" id="idp5114384"/> associates <em>values</em> with <em>keys</em> and allows you to quickly <a data-type="indexterm" data-primary="key/value pairs (in Python dictionaries)" id="idp5116048"/>retrieve the value corresponding to a given key:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">empty_dict</code> <code class="o">=</code> <code class="p">{}</code> <code class="c"># Pythonic</code> <code class="n">empty_dict2</code> <code class="o">=</code> <code class="nb">dict</code><code class="p">()</code> <code class="c"># less Pythonic</code> <code class="n">grades</code> <code class="o">=</code> <code class="p">{</code> <code class="s">"Joel"</code> <code class="p">:</code> <code class="mi">80</code><code class="p">,</code> <code class="s">"Tim"</code> <code class="p">:</code> <code class="mi">95</code> <code class="p">}</code> <code class="c"># dictionary literal</code></pre> <p>You can look up the value for a key using square brackets:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">joels_grade</code> <code class="o">=</code> <code class="n">grades</code><code class="p">[</code><code class="s">"Joel"</code><code class="p">]</code> <code class="c"># equals 80</code></pre> <p>But you’ll get a <code>KeyError</code> if you ask for a key that’s not in the dictionary:</p> <pre data-type="programlisting" data-code-language="py"><code class="k">try</code><code class="p">:</code> <code class="n">kates_grade</code> <code class="o">=</code> <code class="n">grades</code><code class="p">[</code><code class="s">"Kate"</code><code class="p">]</code> <code class="k">except</code> <code class="ne">KeyError</code><code class="p">:</code> <code class="k">print</code> <code class="s">"no grade for Kate!"</code></pre> <p>You can check for <a data-type="indexterm" data-primary="in operator (Python)" id="idp5260944"/>the existence of a key using <code>in</code>:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">joel_has_grade</code> <code class="o">=</code> <code class="s">"Joel"</code> <code class="ow">in</code> <code class="n">grades</code> <code class="c"># True</code> <code class="n">kate_has_grade</code> <code class="o">=</code> <code class="s">"Kate"</code> <code class="ow">in</code> <code class="n">grades</code> <code class="c"># False</code></pre> <p>Dictionaries have a <code>get</code> method that returns a default value (instead of raising an exception) when you look up a key that’s not in the dictionary:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">joels_grade</code> <code class="o">=</code> <code class="n">grades</code><code class="o">.</code><code class="n">get</code><code class="p">(</code><code class="s">"Joel"</code><code class="p">,</code> <code class="mi">0</code><code class="p">)</code> <code class="c"># equals 80</code> <code class="n">kates_grade</code> <code class="o">=</code> <code class="n">grades</code><code class="o">.</code><code class="n">get</code><code class="p">(</code><code class="s">"Kate"</code><code class="p">,</code> <code class="mi">0</code><code class="p">)</code> <code class="c"># equals 0</code> <code class="n">no_ones_grade</code> <code class="o">=</code> <code class="n">grades</code><code class="o">.</code><code class="n">get</code><code class="p">(</code><code class="s">"No One"</code><code class="p">)</code> <code class="c"># default default is None</code></pre> <p>You assign key-value pairs using the same square brackets:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">grades</code><code class="p">[</code><code class="s">"Tim"</code><code class="p">]</code> <code class="o">=</code> <code class="mi">99</code> <code class="c"># replaces the old value</code> <code class="n">grades</code><code class="p">[</code><code class="s">"Kate"</code><code class="p">]</code> <code class="o">=</code> <code class="mi">100</code> <code class="c"># adds a third entry</code> <code class="n">num_students</code> <code class="o">=</code> <code class="nb">len</code><code class="p">(</code><code class="n">grades</code><code class="p">)</code> <code class="c"># equals 3</code></pre> <p>We will frequently use dictionaries as a simple way to represent structured data:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">tweet</code> <code class="o">=</code> <code class="p">{</code> <code class="s">"user"</code> <code class="p">:</code> <code class="s">"joelgrus"</code><code class="p">,</code> <code class="s">"text"</code> <code class="p">:</code> <code class="s">"Data Science is Awesome"</code><code class="p">,</code> <code class="s">"retweet_count"</code> <code class="p">:</code> <code class="mi">100</code><code class="p">,</code> <code class="s">"hashtags"</code> <code class="p">:</code> <code class="p">[</code><code class="s">"#data"</code><code class="p">,</code> <code class="s">"#science"</code><code class="p">,</code> <code class="s">"#datascience"</code><code class="p">,</code> <code class="s">"#awesome"</code><code class="p">,</code> <code class="s">"#yolo"</code><code class="p">]</code> <code class="p">}</code></pre> <p>Besides looking for specific keys we can look at all of them:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">tweet_keys</code> <code class="o">=</code> <code class="n">tweet</code><code class="o">.</code><code class="n">keys</code><code class="p">()</code> <code class="c"># list of keys</code> <code class="n">tweet_values</code> <code class="o">=</code> <code class="n">tweet</code><code class="o">.</code><code class="n">values</code><code class="p">()</code> <code class="c"># list of values</code> <code class="n">tweet_items</code> <code class="o">=</code> <code class="n">tweet</code><code class="o">.</code><code class="n">items</code><code class="p">()</code> <code class="c"># list of (key, value) tuples</code> <code class="s">"user"</code> <code class="ow">in</code> <code class="n">tweet_keys</code> <code class="c"># True, but uses a slow list in</code> <code class="s">"user"</code> <code class="ow">in</code> <code class="n">tweet</code> <code class="c"># more Pythonic, uses faster dict in</code> <code class="s">"joelgrus"</code> <code class="ow">in</code> <code class="n">tweet_values</code> <code class="c"># True</code></pre> <p>Dictionary keys must be immutable; in particular, you cannot use <code>list</code>s as keys. If you need a multipart key, you should use a <code>tuple</code> or figure out a way to turn the key into a string.</p> <section data-type="sect3" data-pdf-bookmark="defaultdict"><div class="sect3" id="idp5509648"> <h3>defaultdict</h3> <p>Imagine that you’re trying to count the words in a document.<a data-type="indexterm" data-primary="dictionaries (Python)" data-secondary="defaultdict" id="idp5511344"/> An obvious approach is to create a dictionary in which the keys are words and the values are counts. As you check each word, you can increment its count if it’s already in the dictionary and add it to the dictionary if it’s not:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">word_counts</code> <code class="o">=</code> <code class="p">{}</code> <code class="k">for</code> <code class="n">word</code> <code class="ow">in</code> <code class="n">document</code><code class="p">:</code> <code class="k">if</code> <code class="n">word</code> <code class="ow">in</code> <code class="n">word_counts</code><code class="p">:</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">+=</code> <code class="mi">1</code> <code class="k">else</code><code class="p">:</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">=</code> <code class="mi">1</code></pre> <p>You could also use the “forgiveness is better than permission” approach and just handle the exception from trying to look up a missing key:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">word_counts</code> <code class="o">=</code> <code class="p">{}</code> <code class="k">for</code> <code class="n">word</code> <code class="ow">in</code> <code class="n">document</code><code class="p">:</code> <code class="k">try</code><code class="p">:</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">+=</code> <code class="mi">1</code> <code class="k">except</code> <code class="ne">KeyError</code><code class="p">:</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">=</code> <code class="mi">1</code></pre> <p>A third approach is to use <code>get</code>, which behaves gracefully for missing keys:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">word_counts</code> <code class="o">=</code> <code class="p">{}</code> <code class="k">for</code> <code class="n">word</code> <code class="ow">in</code> <code class="n">document</code><code class="p">:</code> <code class="n">previous_count</code> <code class="o">=</code> <code class="n">word_counts</code><code class="o">.</code><code class="n">get</code><code class="p">(</code><code class="n">word</code><code class="p">,</code> <code class="mi">0</code><code class="p">)</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">=</code> <code class="n">previous_count</code> <code class="o">+</code> <code class="mi">1</code></pre> <p>Every one of these is slightly unwieldy, which is why <code>defaultdict</code> is useful. A <code>defaultdict</code> is like a regular dictionary, except that when you try to look up a key it doesn’t contain, it first adds a value for it using a zero-argument function you provided when you created it. In order to use <code>defaultdict</code>s, you have to import them from <code>collections</code>:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">from</code> <code class="nn">collections</code> <code class="kn">import</code> <code class="n">defaultdict</code> <code class="n">word_counts</code> <code class="o">=</code> <code class="n">defaultdict</code><code class="p">(</code><code class="nb">int</code><code class="p">)</code> <code class="c"># int() produces 0</code> <code class="k">for</code> <code class="n">word</code> <code class="ow">in</code> <code class="n">document</code><code class="p">:</code> <code class="n">word_counts</code><code class="p">[</code><code class="n">word</code><code class="p">]</code> <code class="o">+=</code> <code class="mi">1</code></pre> <p>They can also be useful with <code>list</code> or <code>dict</code> or even your own functions:</p> <pre data-type="programlisting" data-code-language="py"><code class="n">dd_list</code> <code class="o">=</code> <code class="n">defaultdict</code><code class="p">(</code><code class="nb">list</code><code class="p">)</code> <code class="c"># list() produces an empty list</code> <code class="n">dd_list</code><code class="p">[</code><code class="mi">2</code><code class="p">]</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="mi">1</code><code class="p">)</code> <code class="c"># now dd_list contains {2: [1]}</code> <code class="n">dd_dict</code> <code class="o">=</code> <code class="n">defaultdict</code><code class="p">(</code><code class="nb">dict</code><code class="p">)</code> <code class="c"># dict() produces an empty dict</code> <code class="n">dd_dict</code><code class="p">[</code><code class="s">"Joel"</code><code class="p">][</code><code class="s">"City"</code><code class="p">]</code> <code class="o">=</code> <code class="s">"Seattle"</code> <code class="c"># { "Joel" : { "City" : Seattle"}}</code> <code class="n">dd_pair</code> <code class="o">=</code> <code class="n">defaultdict</code><code class="p">(</code><code class="k">lambda</code><code class="p">:</code> <code class="p">[</code><code class="mi">0</code><code class="p">,</code> <code class="mi">0</code><code class="p">])</code> <code class="n">dd_pair</code><code class="p">[</code><code class="mi">2</code><code class="p">][</code><code class="mi">1</code><code class="p">]</code> <code class="o">=</code> <code class="mi">1</code> <code class="c"># now dd_pair contains {2: [0,1]}</code></pre> <p>These will be useful when we’re using dictionaries to “collect” results by some key and don’t want to have to check every time to see if the key exists yet.<a data-type="indexterm" data-primary="Python" data-secondary="dictionaries" data-startref="ix_Pythondict" id="idp5739936"/></p> </div></section> <section data-type="sect3" data-pdf-bookmark="Counter"><div class="sect3" id="idp5510240"> <h3>Counter</h3> <p>A <code>Counter</code> turns a sequence of values into a <code>defaultdict(int)</code>-like object mapping keys to counts.<a data-type="indexterm" data-primary="Counter (Python)" id="idp5568096"/><a data-type="indexterm" data-primary="Python" data-secondary="Counter" id="idp5568800"/> We will primarily use it to create histograms:</p> <pre data-type="programlisting" data-code-language="py"><code class="kn">from</code> <code class="nn">collections</code> <code class="kn">import</code> <code class="n">Counter</code> <code class="n">c</code> <code class="o">=</code> <code class="n">Counter</code><code class="p">([</code><code class="mi">0</code><code class="p">,</code> <code class="mi">1</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">0</code><code class="p">])</code> <code class="c"># c is (basically) { 0 : 2, 1 : 1, 2 : 1 }</code></pre> <p>This gives us a very simple way to solve our <code>word_count