UNPKG

skimr

Version:

CLI EDA for CSVs

720 lines (691 loc) 89 kB
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="generator" content="pandoc" /> <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>dplyr &lt;-&gt; base R</title> <script>// Pandoc 2.9 adds attributes on both header and div. We remove the former (to // be compatible with the behavior of Pandoc < 2.8). document.addEventListener('DOMContentLoaded', function(e) { var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); var i, h, a; for (i = 0; i < hs.length; i++) { h = hs[i]; if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 a = h.attributes; while (a.length > 0) h.removeAttribute(a[0].name); } }); </script> <style type="text/css"> code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} </style> <style type="text/css"> code { white-space: pre; } .sourceCode { overflow: visible; } </style> <style type="text/css" data-origin="pandoc"> pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } code span.at { color: #7d9029; } code span.bn { color: #40a070; } code span.bu { color: #008000; } code span.cf { color: #007020; font-weight: bold; } code span.ch { color: #4070a0; } code span.cn { color: #880000; } code span.co { color: #60a0b0; font-style: italic; } code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } code span.do { color: #ba2121; font-style: italic; } code span.dt { color: #902000; } code span.dv { color: #40a070; } code span.er { color: #ff0000; font-weight: bold; } code span.ex { } code span.fl { color: #40a070; } code span.fu { color: #06287e; } code span.im { color: #008000; font-weight: bold; } code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } code span.kw { color: #007020; font-weight: bold; } code span.op { color: #666666; } code span.ot { color: #007020; } code span.pp { color: #bc7a00; } code span.sc { color: #4070a0; } code span.ss { color: #bb6688; } code span.st { color: #4070a0; } code span.va { color: #19177c; } code span.vs { color: #4070a0; } code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } </style> <script> // apply pandoc div.sourceCode style to pre.sourceCode instead (function() { var sheets = document.styleSheets; for (var i = 0; i < sheets.length; i++) { if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue; try { var rules = sheets[i].cssRules; } catch (e) { continue; } var j = 0; while (j < rules.length) { var rule = rules[j]; // check if there is a div.sourceCode rule if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") { j++; continue; } var style = rule.style.cssText; // check if color or background-color is set if (rule.style.color === '' && rule.style.backgroundColor === '') { j++; continue; } // replace div.sourceCode by a pre.sourceCode rule sheets[i].deleteRule(j); sheets[i].insertRule('pre.sourceCode{' + style + '}', j); } } })(); </script> <style type="text/css">body { background-color: #fff; margin: 1em auto; max-width: 700px; overflow: visible; padding-left: 2em; padding-right: 2em; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; line-height: 1.35; } #TOC { clear: both; margin: 0 0 10px 10px; padding: 4px; width: 400px; border: 1px solid #CCCCCC; border-radius: 5px; background-color: #f6f6f6; font-size: 13px; line-height: 1.3; } #TOC .toctitle { font-weight: bold; font-size: 15px; margin-left: 5px; } #TOC ul { padding-left: 40px; margin-left: -1.5em; margin-top: 5px; margin-bottom: 5px; } #TOC ul ul { margin-left: -2em; } #TOC li { line-height: 16px; } table { margin: 1em auto; border-width: 1px; border-color: #DDDDDD; border-style: outset; border-collapse: collapse; } table th { border-width: 2px; padding: 5px; border-style: inset; } table td { border-width: 1px; border-style: inset; line-height: 18px; padding: 5px 5px; } table, table th, table td { border-left-style: none; border-right-style: none; } table thead, table tr.even { background-color: #f7f7f7; } p { margin: 0.5em 0; } blockquote { background-color: #f6f6f6; padding: 0.25em 0.75em; } hr { border-style: solid; border: none; border-top: 1px solid #777; margin: 28px 0; } dl { margin-left: 0; } dl dd { margin-bottom: 13px; margin-left: 13px; } dl dt { font-weight: bold; } ul { margin-top: 0; } ul li { list-style: circle outside; } ul ul { margin-bottom: 0; } pre, code { background-color: #f7f7f7; border-radius: 3px; color: #333; white-space: pre-wrap; } pre { border-radius: 3px; margin: 5px 0px 10px 0px; padding: 10px; } pre:not([class]) { background-color: #f7f7f7; } code { font-family: Consolas, Monaco, 'Courier New', monospace; font-size: 85%; } p > code, li > code { padding: 2px 0px; } div.figure { text-align: center; } img { background-color: #FFFFFF; padding: 2px; border: 1px solid #DDDDDD; border-radius: 3px; border: 1px solid #CCCCCC; margin: 0 5px; } h1 { margin-top: 0; font-size: 35px; line-height: 40px; } h2 { border-bottom: 4px solid #f7f7f7; padding-top: 10px; padding-bottom: 2px; font-size: 145%; } h3 { border-bottom: 2px solid #f7f7f7; padding-top: 10px; font-size: 120%; } h4 { border-bottom: 1px solid #f7f7f7; margin-left: 8px; font-size: 105%; } h5, h6 { border-bottom: 1px solid #ccc; font-size: 105%; } a { color: #0033dd; text-decoration: none; } a:hover { color: #6666ff; } a:visited { color: #800080; } a:visited:hover { color: #BB00BB; } a[href^="http:"] { text-decoration: underline; } a[href^="https:"] { text-decoration: underline; } code > span.kw { color: #555; font-weight: bold; } code > span.dt { color: #902000; } code > span.dv { color: #40a070; } code > span.bn { color: #d14; } code > span.fl { color: #d14; } code > span.ch { color: #d14; } code > span.st { color: #d14; } code > span.co { color: #888888; font-style: italic; } code > span.ot { color: #007020; } code > span.al { color: #ff0000; font-weight: bold; } code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61717; background-color: #e3d2d2; } </style> </head> <body> <h1 class="title toc-ignore">dplyr &lt;-&gt; base R</h1> <p>This vignette compares dplyr functions to their base R equivalents. This helps those familiar with base R understand better what dplyr does, and shows dplyr users how you might express the same ideas in base R code. We’ll start with a rough overview of the major differences, then discuss the one table verbs in more detail, followed by the two table verbs.</p> <div id="overview" class="section level1"> <h1>Overview</h1> <ol style="list-style-type: decimal"> <li><p>The code dplyr verbs input and output data frames. This contrasts with base R functions which more frequently work with individual vectors.</p></li> <li><p>dplyr relies heavily on “non-standard evaluation” so that you don’t need to use <code>$</code> to refer to columns in the “current” data frame. This behaviour is inspired by the base functions <code>subset()</code> and <code>transform()</code>.</p></li> <li><p>dplyr solutions tend to use a variety of single purpose verbs, while base R solutions typically tend to use <code>[</code> in a variety of ways, depending on the task at hand.</p></li> <li><p>Multiple dplyr verbs are often strung together into a pipeline by <code>%&gt;%</code>. In base R, you’ll typically save intermediate results to a variable that you either discard, or repeatedly overwrite.</p></li> <li><p>All dplyr verbs handle “grouped” data frames so that the code to perform a computation per-group looks very similar to code that works on a whole data frame. In base R, per-group operations tend to have varied forms.</p></li> </ol> </div> <div id="one-table-verbs" class="section level1"> <h1>One table verbs</h1> <p>The following table shows a condensed translation between dplyr verbs and their base R equivalents. The following sections describe each operation in more detail. You’ll learn more about the dplyr verbs in their documentation and in <code>vignette(&quot;dplyr&quot;)</code>.</p> <table> <colgroup> <col width="38%" /> <col width="61%" /> </colgroup> <thead> <tr class="header"> <th>dplyr</th> <th>base</th> </tr> </thead> <tbody> <tr class="odd"> <td><code>arrange(df, x)</code></td> <td><code>df[order(x), , drop = FALSE]</code></td> </tr> <tr class="even"> <td><code>distinct(df, x)</code></td> <td><code>df[!duplicated(x), , drop = FALSE]</code>, <code>unique()</code></td> </tr> <tr class="odd"> <td><code>filter(df, x)</code></td> <td><code>df[which(x), , drop = FALSE]</code>, <code>subset()</code></td> </tr> <tr class="even"> <td><code>mutate(df, z = x + y)</code></td> <td><code>df$z &lt;- df$x + df$y</code>, <code>transform()</code></td> </tr> <tr class="odd"> <td><code>pull(df, 1)</code></td> <td><code>df[[1]]</code></td> </tr> <tr class="even"> <td><code>pull(df, x)</code></td> <td><code>df$x</code></td> </tr> <tr class="odd"> <td><code>rename(df, y = x)</code></td> <td><code>names(df)[names(df) == &quot;x&quot;] &lt;- &quot;y&quot;</code></td> </tr> <tr class="even"> <td><code>relocate(df, y)</code></td> <td><code>df[union(&quot;y&quot;, names(df))]</code></td> </tr> <tr class="odd"> <td><code>select(df, x, y)</code></td> <td><code>df[c(&quot;x&quot;, &quot;y&quot;)]</code>, <code>subset()</code></td> </tr> <tr class="even"> <td><code>select(df, starts_with(&quot;x&quot;))</code></td> <td><code>df[grepl(&quot;^x&quot;, names(df))]</code></td> </tr> <tr class="odd"> <td><code>summarise(df, mean(x))</code></td> <td><code>mean(df$x)</code>, <code>tapply()</code>, <code>aggregate()</code>, <code>by()</code></td> </tr> <tr class="even"> <td><code>slice(df, c(1, 2, 5))</code></td> <td><code>df[c(1, 2, 5), , drop = FALSE]</code></td> </tr> </tbody> </table> <p>To begin, we’ll load dplyr and convert <code>mtcars</code> and <code>iris</code> to tibbles so that we can easily show only abbreviated output for each operation.</p> <div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span> <span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>mtcars <span class="ot">&lt;-</span> <span class="fu">as_tibble</span>(mtcars)</span> <span id="cb1-3"><a href="#cb1-3" tabindex="-1"></a>iris <span class="ot">&lt;-</span> <span class="fu">as_tibble</span>(iris)</span></code></pre></div> <div id="arrange-arrange-rows-by-variables" class="section level2"> <h2><code>arrange()</code>: Arrange rows by variables</h2> <p><code>dplyr::arrange()</code> orders the rows of a data frame by the values of one or more columns:</p> <div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a>mtcars <span class="sc">%&gt;%</span> <span class="fu">arrange</span>(cyl, disp)</span> <span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 32 × 11</span></span> <span id="cb2-3"><a href="#cb2-3" tabindex="-1"></a><span class="co">#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span id="cb2-4"><a href="#cb2-4" tabindex="-1"></a><span class="co">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span> <span id="cb2-5"><a href="#cb2-5" tabindex="-1"></a><span class="co">#&gt; 1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1</span></span> <span id="cb2-6"><a href="#cb2-6" tabindex="-1"></a><span class="co">#&gt; 2 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2</span></span> <span id="cb2-7"><a href="#cb2-7" tabindex="-1"></a><span class="co">#&gt; 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1</span></span> <span id="cb2-8"><a href="#cb2-8" tabindex="-1"></a><span class="co">#&gt; 4 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1</span></span> <span id="cb2-9"><a href="#cb2-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 28 more rows</span></span></code></pre></div> <p>The <code>desc()</code> helper allows you to order selected variables in descending order:</p> <div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>mtcars <span class="sc">%&gt;%</span> <span class="fu">arrange</span>(<span class="fu">desc</span>(cyl), <span class="fu">desc</span>(disp))</span> <span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 32 × 11</span></span> <span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a><span class="co">#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a><span class="co">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span> <span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a><span class="co">#&gt; 1 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4</span></span> <span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a><span class="co">#&gt; 2 10.4 8 460 215 3 5.42 17.8 0 0 3 4</span></span> <span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a><span class="co">#&gt; 3 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4</span></span> <span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a><span class="co">#&gt; 4 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2</span></span> <span id="cb3-9"><a href="#cb3-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 28 more rows</span></span></code></pre></div> <p>We can replicate in base R by using <code>[</code> with <code>order()</code>:</p> <div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a>mtcars[<span class="fu">order</span>(mtcars<span class="sc">$</span>cyl, mtcars<span class="sc">$</span>disp), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span> <span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 32 × 11</span></span> <span id="cb4-3"><a href="#cb4-3" tabindex="-1"></a><span class="co">#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span id="cb4-4"><a href="#cb4-4" tabindex="-1"></a><span class="co">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span> <span id="cb4-5"><a href="#cb4-5" tabindex="-1"></a><span class="co">#&gt; 1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1</span></span> <span id="cb4-6"><a href="#cb4-6" tabindex="-1"></a><span class="co">#&gt; 2 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2</span></span> <span id="cb4-7"><a href="#cb4-7" tabindex="-1"></a><span class="co">#&gt; 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1</span></span> <span id="cb4-8"><a href="#cb4-8" tabindex="-1"></a><span class="co">#&gt; 4 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1</span></span> <span id="cb4-9"><a href="#cb4-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 28 more rows</span></span></code></pre></div> <p>Note the use of <code>drop = FALSE</code>. If you forget this, and the input is a data frame with a single column, the output will be a vector, not a data frame. This is a source of subtle bugs.</p> <p>Base R does not provide a convenient and general way to sort individual variables in descending order, so you have two options:</p> <ul> <li>For numeric variables, you can use <code>-x</code>.</li> <li>You can request <code>order()</code> to sort all variables in descending order.</li> </ul> <div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a>mtcars[<span class="fu">order</span>(mtcars<span class="sc">$</span>cyl, mtcars<span class="sc">$</span>disp, <span class="at">decreasing =</span> <span class="cn">TRUE</span>), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span> <span id="cb5-2"><a href="#cb5-2" tabindex="-1"></a>mtcars[<span class="fu">order</span>(<span class="sc">-</span>mtcars<span class="sc">$</span>cyl, <span class="sc">-</span>mtcars<span class="sc">$</span>disp), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span></code></pre></div> </div> <div id="distinct-select-distinctunique-rows" class="section level2"> <h2><code>distinct()</code>: Select distinct/unique rows</h2> <p><code>dplyr::distinct()</code> selects unique rows:</p> <div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" tabindex="-1"></a>df <span class="ot">&lt;-</span> <span class="fu">tibble</span>(</span> <span id="cb6-2"><a href="#cb6-2" tabindex="-1"></a> <span class="at">x =</span> <span class="fu">sample</span>(<span class="dv">10</span>, <span class="dv">100</span>, <span class="at">rep =</span> <span class="cn">TRUE</span>),</span> <span id="cb6-3"><a href="#cb6-3" tabindex="-1"></a> <span class="at">y =</span> <span class="fu">sample</span>(<span class="dv">10</span>, <span class="dv">100</span>, <span class="at">rep =</span> <span class="cn">TRUE</span>)</span> <span id="cb6-4"><a href="#cb6-4" tabindex="-1"></a>)</span> <span id="cb6-5"><a href="#cb6-5" tabindex="-1"></a></span> <span id="cb6-6"><a href="#cb6-6" tabindex="-1"></a>df <span class="sc">%&gt;%</span> <span class="fu">distinct</span>(x) <span class="co"># selected columns</span></span> <span id="cb6-7"><a href="#cb6-7" tabindex="-1"></a><span class="co">#&gt; # A tibble: 10 × 1</span></span> <span id="cb6-8"><a href="#cb6-8" tabindex="-1"></a><span class="co">#&gt; x</span></span> <span id="cb6-9"><a href="#cb6-9" tabindex="-1"></a><span class="co">#&gt; &lt;int&gt;</span></span> <span id="cb6-10"><a href="#cb6-10" tabindex="-1"></a><span class="co">#&gt; 1 8</span></span> <span id="cb6-11"><a href="#cb6-11" tabindex="-1"></a><span class="co">#&gt; 2 7</span></span> <span id="cb6-12"><a href="#cb6-12" tabindex="-1"></a><span class="co">#&gt; 3 10</span></span> <span id="cb6-13"><a href="#cb6-13" tabindex="-1"></a><span class="co">#&gt; 4 4</span></span> <span id="cb6-14"><a href="#cb6-14" tabindex="-1"></a><span class="co">#&gt; # ℹ 6 more rows</span></span> <span id="cb6-15"><a href="#cb6-15" tabindex="-1"></a>df <span class="sc">%&gt;%</span> <span class="fu">distinct</span>(x, <span class="at">.keep_all =</span> <span class="cn">TRUE</span>) <span class="co"># whole data frame</span></span> <span id="cb6-16"><a href="#cb6-16" tabindex="-1"></a><span class="co">#&gt; # A tibble: 10 × 2</span></span> <span id="cb6-17"><a href="#cb6-17" tabindex="-1"></a><span class="co">#&gt; x y</span></span> <span id="cb6-18"><a href="#cb6-18" tabindex="-1"></a><span class="co">#&gt; &lt;int&gt; &lt;int&gt;</span></span> <span id="cb6-19"><a href="#cb6-19" tabindex="-1"></a><span class="co">#&gt; 1 8 7</span></span> <span id="cb6-20"><a href="#cb6-20" tabindex="-1"></a><span class="co">#&gt; 2 7 3</span></span> <span id="cb6-21"><a href="#cb6-21" tabindex="-1"></a><span class="co">#&gt; 3 10 1</span></span> <span id="cb6-22"><a href="#cb6-22" tabindex="-1"></a><span class="co">#&gt; 4 4 2</span></span> <span id="cb6-23"><a href="#cb6-23" tabindex="-1"></a><span class="co">#&gt; # ℹ 6 more rows</span></span></code></pre></div> <p>There are two equivalents in base R, depending on whether you want the whole data frame, or just selected variables:</p> <div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" tabindex="-1"></a><span class="fu">unique</span>(df[<span class="st">&quot;x&quot;</span>]) <span class="co"># selected columns</span></span> <span id="cb7-2"><a href="#cb7-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 10 × 1</span></span> <span id="cb7-3"><a href="#cb7-3" tabindex="-1"></a><span class="co">#&gt; x</span></span> <span id="cb7-4"><a href="#cb7-4" tabindex="-1"></a><span class="co">#&gt; &lt;int&gt;</span></span> <span id="cb7-5"><a href="#cb7-5" tabindex="-1"></a><span class="co">#&gt; 1 8</span></span> <span id="cb7-6"><a href="#cb7-6" tabindex="-1"></a><span class="co">#&gt; 2 7</span></span> <span id="cb7-7"><a href="#cb7-7" tabindex="-1"></a><span class="co">#&gt; 3 10</span></span> <span id="cb7-8"><a href="#cb7-8" tabindex="-1"></a><span class="co">#&gt; 4 4</span></span> <span id="cb7-9"><a href="#cb7-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 6 more rows</span></span> <span id="cb7-10"><a href="#cb7-10" tabindex="-1"></a>df[<span class="sc">!</span><span class="fu">duplicated</span>(df<span class="sc">$</span>x), , drop <span class="ot">=</span> <span class="cn">FALSE</span>] <span class="co"># whole data frame</span></span> <span id="cb7-11"><a href="#cb7-11" tabindex="-1"></a><span class="co">#&gt; # A tibble: 10 × 2</span></span> <span id="cb7-12"><a href="#cb7-12" tabindex="-1"></a><span class="co">#&gt; x y</span></span> <span id="cb7-13"><a href="#cb7-13" tabindex="-1"></a><span class="co">#&gt; &lt;int&gt; &lt;int&gt;</span></span> <span id="cb7-14"><a href="#cb7-14" tabindex="-1"></a><span class="co">#&gt; 1 8 7</span></span> <span id="cb7-15"><a href="#cb7-15" tabindex="-1"></a><span class="co">#&gt; 2 7 3</span></span> <span id="cb7-16"><a href="#cb7-16" tabindex="-1"></a><span class="co">#&gt; 3 10 1</span></span> <span id="cb7-17"><a href="#cb7-17" tabindex="-1"></a><span class="co">#&gt; 4 4 2</span></span> <span id="cb7-18"><a href="#cb7-18" tabindex="-1"></a><span class="co">#&gt; # ℹ 6 more rows</span></span></code></pre></div> </div> <div id="filter-return-rows-with-matching-conditions" class="section level2"> <h2><code>filter()</code>: Return rows with matching conditions</h2> <p><code>dplyr::filter()</code> selects rows where an expression is <code>TRUE</code>:</p> <div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" tabindex="-1"></a>starwars <span class="sc">%&gt;%</span> <span class="fu">filter</span>(species <span class="sc">==</span> <span class="st">&quot;Human&quot;</span>)</span> <span id="cb8-2"><a href="#cb8-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 35 × 14</span></span> <span id="cb8-3"><a href="#cb8-3" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb8-4"><a href="#cb8-4" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb8-5"><a href="#cb8-5" tabindex="-1"></a><span class="co">#&gt; 1 Luke Sky… 172 77 blond fair blue 19 male mascu…</span></span> <span id="cb8-6"><a href="#cb8-6" tabindex="-1"></a><span class="co">#&gt; 2 Darth Va… 202 136 none white yellow 41.9 male mascu…</span></span> <span id="cb8-7"><a href="#cb8-7" tabindex="-1"></a><span class="co">#&gt; 3 Leia Org… 150 49 brown light brown 19 fema… femin…</span></span> <span id="cb8-8"><a href="#cb8-8" tabindex="-1"></a><span class="co">#&gt; 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…</span></span> <span id="cb8-9"><a href="#cb8-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 31 more rows</span></span> <span id="cb8-10"><a href="#cb8-10" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb8-11"><a href="#cb8-11" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb8-12"><a href="#cb8-12" tabindex="-1"></a>starwars <span class="sc">%&gt;%</span> <span class="fu">filter</span>(mass <span class="sc">&gt;</span> <span class="dv">1000</span>)</span> <span id="cb8-13"><a href="#cb8-13" tabindex="-1"></a><span class="co">#&gt; # A tibble: 1 × 14</span></span> <span id="cb8-14"><a href="#cb8-14" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb8-15"><a href="#cb8-15" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb8-16"><a href="#cb8-16" tabindex="-1"></a><span class="co">#&gt; 1 Jabba De… 175 1358 &lt;NA&gt; green-tan… orange 600 herm… mascu…</span></span> <span id="cb8-17"><a href="#cb8-17" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb8-18"><a href="#cb8-18" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb8-19"><a href="#cb8-19" tabindex="-1"></a>starwars <span class="sc">%&gt;%</span> <span class="fu">filter</span>(hair_color <span class="sc">==</span> <span class="st">&quot;none&quot;</span> <span class="sc">&amp;</span> eye_color <span class="sc">==</span> <span class="st">&quot;black&quot;</span>)</span> <span id="cb8-20"><a href="#cb8-20" tabindex="-1"></a><span class="co">#&gt; # A tibble: 9 × 14</span></span> <span id="cb8-21"><a href="#cb8-21" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb8-22"><a href="#cb8-22" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb8-23"><a href="#cb8-23" tabindex="-1"></a><span class="co">#&gt; 1 Nien Nunb 160 68 none grey black NA male mascu…</span></span> <span id="cb8-24"><a href="#cb8-24" tabindex="-1"></a><span class="co">#&gt; 2 Gasgano 122 NA none white, bl… black NA male mascu…</span></span> <span id="cb8-25"><a href="#cb8-25" tabindex="-1"></a><span class="co">#&gt; 3 Kit Fisto 196 87 none green black NA male mascu…</span></span> <span id="cb8-26"><a href="#cb8-26" tabindex="-1"></a><span class="co">#&gt; 4 Plo Koon 188 80 none orange black 22 male mascu…</span></span> <span id="cb8-27"><a href="#cb8-27" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more rows</span></span> <span id="cb8-28"><a href="#cb8-28" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb8-29"><a href="#cb8-29" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span></code></pre></div> <p>The closest base equivalent (and the inspiration for <code>filter()</code>) is <code>subset()</code>:</p> <div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" tabindex="-1"></a><span class="fu">subset</span>(starwars, species <span class="sc">==</span> <span class="st">&quot;Human&quot;</span>)</span> <span id="cb9-2"><a href="#cb9-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 35 × 14</span></span> <span id="cb9-3"><a href="#cb9-3" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb9-4"><a href="#cb9-4" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb9-5"><a href="#cb9-5" tabindex="-1"></a><span class="co">#&gt; 1 Luke Sky… 172 77 blond fair blue 19 male mascu…</span></span> <span id="cb9-6"><a href="#cb9-6" tabindex="-1"></a><span class="co">#&gt; 2 Darth Va… 202 136 none white yellow 41.9 male mascu…</span></span> <span id="cb9-7"><a href="#cb9-7" tabindex="-1"></a><span class="co">#&gt; 3 Leia Org… 150 49 brown light brown 19 fema… femin…</span></span> <span id="cb9-8"><a href="#cb9-8" tabindex="-1"></a><span class="co">#&gt; 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…</span></span> <span id="cb9-9"><a href="#cb9-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 31 more rows</span></span> <span id="cb9-10"><a href="#cb9-10" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb9-11"><a href="#cb9-11" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb9-12"><a href="#cb9-12" tabindex="-1"></a><span class="fu">subset</span>(starwars, mass <span class="sc">&gt;</span> <span class="dv">1000</span>)</span> <span id="cb9-13"><a href="#cb9-13" tabindex="-1"></a><span class="co">#&gt; # A tibble: 1 × 14</span></span> <span id="cb9-14"><a href="#cb9-14" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb9-15"><a href="#cb9-15" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb9-16"><a href="#cb9-16" tabindex="-1"></a><span class="co">#&gt; 1 Jabba De… 175 1358 &lt;NA&gt; green-tan… orange 600 herm… mascu…</span></span> <span id="cb9-17"><a href="#cb9-17" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb9-18"><a href="#cb9-18" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb9-19"><a href="#cb9-19" tabindex="-1"></a><span class="fu">subset</span>(starwars, hair_color <span class="sc">==</span> <span class="st">&quot;none&quot;</span> <span class="sc">&amp;</span> eye_color <span class="sc">==</span> <span class="st">&quot;black&quot;</span>)</span> <span id="cb9-20"><a href="#cb9-20" tabindex="-1"></a><span class="co">#&gt; # A tibble: 9 × 14</span></span> <span id="cb9-21"><a href="#cb9-21" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb9-22"><a href="#cb9-22" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb9-23"><a href="#cb9-23" tabindex="-1"></a><span class="co">#&gt; 1 Nien Nunb 160 68 none grey black NA male mascu…</span></span> <span id="cb9-24"><a href="#cb9-24" tabindex="-1"></a><span class="co">#&gt; 2 Gasgano 122 NA none white, bl… black NA male mascu…</span></span> <span id="cb9-25"><a href="#cb9-25" tabindex="-1"></a><span class="co">#&gt; 3 Kit Fisto 196 87 none green black NA male mascu…</span></span> <span id="cb9-26"><a href="#cb9-26" tabindex="-1"></a><span class="co">#&gt; 4 Plo Koon 188 80 none orange black 22 male mascu…</span></span> <span id="cb9-27"><a href="#cb9-27" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more rows</span></span> <span id="cb9-28"><a href="#cb9-28" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb9-29"><a href="#cb9-29" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span></code></pre></div> <p>You can also use <code>[</code> but this also requires the use of <code>which()</code> to remove <code>NA</code>s:</p> <div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" tabindex="-1"></a>starwars[<span class="fu">which</span>(starwars<span class="sc">$</span>species <span class="sc">==</span> <span class="st">&quot;Human&quot;</span>), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span> <span id="cb10-2"><a href="#cb10-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 35 × 14</span></span> <span id="cb10-3"><a href="#cb10-3" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb10-4"><a href="#cb10-4" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb10-5"><a href="#cb10-5" tabindex="-1"></a><span class="co">#&gt; 1 Luke Sky… 172 77 blond fair blue 19 male mascu…</span></span> <span id="cb10-6"><a href="#cb10-6" tabindex="-1"></a><span class="co">#&gt; 2 Darth Va… 202 136 none white yellow 41.9 male mascu…</span></span> <span id="cb10-7"><a href="#cb10-7" tabindex="-1"></a><span class="co">#&gt; 3 Leia Org… 150 49 brown light brown 19 fema… femin…</span></span> <span id="cb10-8"><a href="#cb10-8" tabindex="-1"></a><span class="co">#&gt; 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…</span></span> <span id="cb10-9"><a href="#cb10-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 31 more rows</span></span> <span id="cb10-10"><a href="#cb10-10" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb10-11"><a href="#cb10-11" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb10-12"><a href="#cb10-12" tabindex="-1"></a>starwars[<span class="fu">which</span>(starwars<span class="sc">$</span>mass <span class="sc">&gt;</span> <span class="dv">1000</span>), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span> <span id="cb10-13"><a href="#cb10-13" tabindex="-1"></a><span class="co">#&gt; # A tibble: 1 × 14</span></span> <span id="cb10-14"><a href="#cb10-14" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb10-15"><a href="#cb10-15" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb10-16"><a href="#cb10-16" tabindex="-1"></a><span class="co">#&gt; 1 Jabba De… 175 1358 &lt;NA&gt; green-tan… orange 600 herm… mascu…</span></span> <span id="cb10-17"><a href="#cb10-17" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb10-18"><a href="#cb10-18" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span> <span id="cb10-19"><a href="#cb10-19" tabindex="-1"></a>starwars[<span class="fu">which</span>(starwars<span class="sc">$</span>hair_color <span class="sc">==</span> <span class="st">&quot;none&quot;</span> <span class="sc">&amp;</span> starwars<span class="sc">$</span>eye_color <span class="sc">==</span> <span class="st">&quot;black&quot;</span>), , drop <span class="ot">=</span> <span class="cn">FALSE</span>]</span> <span id="cb10-20"><a href="#cb10-20" tabindex="-1"></a><span class="co">#&gt; # A tibble: 9 × 14</span></span> <span id="cb10-21"><a href="#cb10-21" tabindex="-1"></a><span class="co">#&gt; name height mass hair_color skin_color eye_color birth_year sex gender</span></span> <span id="cb10-22"><a href="#cb10-22" tabindex="-1"></a><span class="co">#&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; </span></span> <span id="cb10-23"><a href="#cb10-23" tabindex="-1"></a><span class="co">#&gt; 1 Nien Nunb 160 68 none grey black NA male mascu…</span></span> <span id="cb10-24"><a href="#cb10-24" tabindex="-1"></a><span class="co">#&gt; 2 Gasgano 122 NA none white, bl… black NA male mascu…</span></span> <span id="cb10-25"><a href="#cb10-25" tabindex="-1"></a><span class="co">#&gt; 3 Kit Fisto 196 87 none green black NA male mascu…</span></span> <span id="cb10-26"><a href="#cb10-26" tabindex="-1"></a><span class="co">#&gt; 4 Plo Koon 188 80 none orange black 22 male mascu…</span></span> <span id="cb10-27"><a href="#cb10-27" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more rows</span></span> <span id="cb10-28"><a href="#cb10-28" tabindex="-1"></a><span class="co">#&gt; # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,</span></span> <span id="cb10-29"><a href="#cb10-29" tabindex="-1"></a><span class="co">#&gt; # vehicles &lt;list&gt;, starships &lt;list&gt;</span></span></code></pre></div> </div> <div id="mutate-create-or-transform-variables" class="section level2"> <h2><code>mutate()</code>: Create or transform variables</h2> <p><code>dplyr::mutate()</code> creates new variables from existing variables:</p> <div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" tabindex="-1"></a>df <span class="sc">%&gt;%</span> <span class="fu">mutate</span>(<span class="at">z =</span> x <span class="sc">+</span> y, <span class="at">z2 =</span> z <span class="sc">^</span> <span class="dv">2</span>)</span> <span id="cb11-2"><a href="#cb11-2" tabindex="-1"></a><span class="co">#&gt; # A tibble: 100 × 4</span></span> <span id="cb11-3"><a href="#cb11-3" tabindex="-1"></a><span class="co">#&gt; x y z z2</span></span> <span id="cb11-4"><a href="#cb11-4" tabindex="-1"></a><span class="co">#&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt;</span></span> <span id="cb11-5"><a href="#cb11-5" tabindex="-1"></a><span class="co">#&gt; 1 8 7 15 225</span></span> <span id="cb11-6"><a href="#cb11-6" tabindex="-1"></a><span class="co">#&gt; 2 8 9 17 289</span></span> <span id="cb11-7"><a href="#cb11-7" tabindex="-1"></a><span class="co">#&gt; 3 7 3 10 100</span></span> <span id="cb11-8"><a href="#cb11-8" tabindex="-1"></a><span class="co">#&gt; 4 10 1 11 121</span></span> <span id="cb11-9"><a href="#cb11-9" tabindex="-1"></a><span class="co">#&gt; # ℹ 96 more rows</span></span></code></pre></div> <p>The closest base equivalent is <code>transform()</code>, but note that it cannot use freshly created variables:</p> <div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" tabindex="-1"></a><span class="fu">head</span>(<span class="fu">transform</span>(df, <span class="at">z =</span> x <span class="sc">+</span> y, <span class="at">z2 =</span> (x <span class="sc">+</span> y) <span class="sc">^</span> <span class="dv">2</span>))</span> <span id="cb12-2"><a href="#cb12-2" tabindex="-1"></a><span class="co">#&gt; x y z z2</span></span> <span id="cb12-3"><a href="#cb12-3" tabindex="-1"></a><span class="co">#&gt; 1 8 7 15 225</span></span> <span id="cb12-4"><a href="#cb12-4" tabindex="-1"></a><span class="co">#&gt; 2 8 9 17 289</span></span> <span id="cb12-5"><a href="#cb12-5" tabindex="-1"></a><span class="co">#&gt; 3 7 3 10 100</span></span> <span id="cb12-6"><a href="#cb12-6" tabindex="-1"></a><span class="co">#&gt; 4 10 1 11 121</span></span> <span id="cb12-7"><a href="#cb12-7" tabindex="-1"></a><span class="co">#&gt; 5 4 2 6 36</span></span> <span id="cb12-8"><a href="#cb12-8" tabindex="-1"></a><span class="co">#&gt; 6 2 1 3 9</span></span></code></pre></div> <p>Alternatively, you can use <code>$&lt;-</code>:</p> <div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" tabindex="-1"></a>mtcars<span class="sc">$</span>cyl2 <span class="ot">&lt;-</span> mtcars<span class="sc">$</span>cyl <span class="sc">*</span> <span class="dv">2</span></span> <span id="cb13-2"><a href="#cb13-2" tabindex="-1"></a>mtcars<span class="sc">$</span>cyl4 <span class="ot">&lt;-</span> mtcars<span class="sc">$</span>cyl2 <span class="sc">*</span> <span class="dv">2</span></span></code></pre></div> <p>When applied to a grouped data frame, <code>dplyr::mutate()</code> computes new variable once per group:</p> <div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" tabindex="-1"></a>gf <span class="ot">&lt;-</span> <span class="fu">tibble</span>(<span class="at">g =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">2</span>), <span class="at">x =</span> <span class="fu">c</span>(<span class="fl">0.5</span>, <span class="fl">1.5</span>, <span class="fl">2.5</span>, <span class="fl">3.5</span>))</span> <span id="cb14-2"><a href="#cb14-2" tabindex="-1"></a>gf <span class="sc">%&gt;%</span> </span> <span id="cb14-3"><a href="#cb14-3" tabindex="-1"></a> <span class="fu">group_by</span>(g) <span class="sc">%&gt;%</span> </span> <span id="cb14-4"><a href="#cb14-4" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">x_mean =</span> <span class="fu">mean</span>(x), <span class="at">x_rank =</span> <span class="fu">rank</span>(x))</span> <span id="cb14-5"><a href="#cb14-5" tabindex="-1"></a><span class="co">#&gt; # A tibble: 4 × 4</span></span> <span id="cb14-6"><a href="#cb14-6" tabindex="-1"></a><span class="co">#&gt; # Groups: g [2]</span></span> <span id="cb14-7"><a href="#cb14-7" tabindex="-1"></a><span class="co">#&gt; g x x_mean x_rank</span></span> <span id="cb14-8"><a href="#cb14-8" tabindex="-1"></a><span class="co">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span> <span id="cb14-9"><a href="#cb14-9" tabindex="-1"></a><span class="co">#&gt; 1 1 0.5 1 1</span></span> <span id="cb14-10"><a href="#cb14-10" tabindex="-1"></a><span class="co">#&gt; 2 1 1.5 1 2</span></span> <span id="cb14-11"><a href="#cb14-11" tabindex="-1"></a><span class="co">#&gt; 3 2 2.5 3 1</span></span> <span id="cb14-12"><a href="#cb14-12" tabindex="-1"></a><span class="co">#&gt; 4 2 3.5 3 2</span></span></code></pre></div> <p>To replicate this in base R, you can use <code>ave()</code>:</p> <div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" tabindex="-1"></a><span class="fu">transform</span>(gf, </span> <span id="cb15-2"><a href="#cb15-2" tabindex="-1"></a> <span class="at">x_mean =</span> <span class="fu">ave</span>(x, g, <span class="at">FUN =</span> mean), </span> <span id="cb15-3"><a href="#cb15-3" tabindex="-1"></a> <span class="at">x_rank =</span> <span class="fu">ave</span>(x, g, <span class="at">FUN =</span> rank)</span> <span id="cb15-4"><a href="#cb15-4" tabindex="-1"></a>)</span> <span id="cb15-5"><a href="#cb15-5" tabindex="-1"></a><span class="co">#&gt; g x x_mean x_rank</span></span> <span id="cb15-6"><a href="#cb15-6" tabindex="-1"></a><span class="co">#&gt; 1 1 0.5 1 1</span></span> <span id="cb15-7"><a href="#cb15-7" tabindex="-1"></a><span class="co">#&gt; 2 1 1.5 1 2</span></span> <span id="cb15-8"><a href="#cb15-8" tabindex="-1"></a><span class="co">#&gt; 3 2 2.5 3 1</span></span> <span id="cb15-9"><a href="#cb15-9" tabindex="-1"></a><span class="co">#&gt; 4 2 3.5 3 2</span></span></code></pre></div> </div> <div id="pull-pull-out-a-single-variable" class="section level2"> <h2><code>pull()</code>: Pull out a single variable</h2> <p><code>dplyr::pull()</code> extracts a variable either by name or position:</p> <div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" tabindex="-1"></a>mtcars <span class="sc">%&gt;%</span> <span class="fu">pull</span>(<span class="dv">1</span>)</span> <span id="cb16-2"><a href="#cb16-2" tabindex="-1"></a><span class="co">#&gt; [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4</span></span> <span id="cb16-3"><a href="#cb16-3" tabindex="-1"></a><span class="co">#&gt; [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7</span></span> <span id="cb16-4"><a href="#cb16-4" tabindex="-1"></a><span class="co">#&gt; [31] 15.0 21.4</span></span> <span id="cb16-5"><a href="#cb16-5" tabindex="-1"></a>mtcars <span class="sc">%&gt;%</span> <span class="fu">pull</span>(cyl)</span> <span id="cb16-6"><a href="#cb16-6" tabindex="-1"></a><span class="co">#&gt; [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4</span></span></code></pre></div> <p>This equivalent to <code>[[</code> for positions and <code>$</code> for names:</p> <div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" tabindex="-1"></a>mtcars[[<span class="dv">1</span>]]</span> <span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a><span class="co">#&gt; [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4</span></span> <span id="cb17-3"><a href="#cb17-3" tabindex="-1"></a><span class="co">#&gt; [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7</span></span> <span id="cb17-4"><a href="#cb17-4" tabindex="-1"></a><span class="co">#&gt; [31] 15.0 21.4</span></span> <span id="cb17-5"><a href="#cb17-5" tabindex="-1"></a>mtcars<span class="sc">$</span>cyl</span> <span id="cb17-6"><a href="#cb17-6" tabindex="-1"></a><span class="co">#&gt; [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4</span></span></code></pre></div> </div> <div id="relocate-change-column-order" class="section level2"> <h2><code>relocate()</code>: Change column order</h2> <p><code>dplyr::relocate()</code> makes it easy to move a set of columns to a new position (by default, the front):</p> <div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" tabindex="-1"></a><span class="co"># to front</span></span> <span id="cb18-2"><a href="#cb18-2" tabindex="-1"></a>mtcars <span class="sc">%&gt;%</span> <span class="fu">relocate</span>(gear, carb) </span> <span id="cb18-3"><a href="#cb18-3" tabindex="-1"></a><span class="co">#&gt; # A tibble: 32 × 13</span></span> <span id="cb18-4"><a href="#cb18-4" tabindex="-1"></a><span class="co">#&gt; gear carb mpg cyl disp hp drat wt qsec vs am cyl2 cyl4</span></span> <span id="cb18-5"><a href="#cb18-5" tabindex="-1"></a><span class="co">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span> <span id="cb18-6"><a href="#cb18-6" tabindex="-1"></a><span class="co">#&gt; 1 4 4 21 6 160 110 3.9 2.62 16.5 0 1 12 24</span></span> <span id="cb18-7"><a href="#cb18-7" tabindex="-1"></a><span class="co">#&gt; 2 4 4 21 6 160 110 3.9 2.88 17.0 0 1