UNPKG

skimr

Version:

CLI EDA for CSVs

1,084 lines (1,055 loc) 72 kB
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="generator" content="pandoc" /> <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="date" content="2022-12-23" /> <title>Using Skimr</title> <script>// Pandoc 2.9 adds attributes on both header and div. We remove the former (to // be compatible with the behavior of Pandoc < 2.8). document.addEventListener('DOMContentLoaded', function(e) { var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); var i, h, a; for (i = 0; i < hs.length; i++) { h = hs[i]; if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 a = h.attributes; while (a.length > 0) h.removeAttribute(a[0].name); } }); </script> <style type="text/css"> code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} </style> <style type="text/css"> code { white-space: pre; } .sourceCode { overflow: visible; } </style> <style type="text/css" data-origin="pandoc"> pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } code span.at { color: #7d9029; } code span.bn { color: #40a070; } code span.bu { color: #008000; } code span.cf { color: #007020; font-weight: bold; } code span.ch { color: #4070a0; } code span.cn { color: #880000; } code span.co { color: #60a0b0; font-style: italic; } code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } code span.do { color: #ba2121; font-style: italic; } code span.dt { color: #902000; } code span.dv { color: #40a070; } code span.er { color: #ff0000; font-weight: bold; } code span.ex { } code span.fl { color: #40a070; } code span.fu { color: #06287e; } code span.im { color: #008000; font-weight: bold; } code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } code span.kw { color: #007020; font-weight: bold; } code span.op { color: #666666; } code span.ot { color: #007020; } code span.pp { color: #bc7a00; } code span.sc { color: #4070a0; } code span.ss { color: #bb6688; } code span.st { color: #4070a0; } code span.va { color: #19177c; } code span.vs { color: #4070a0; } code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } </style> <script> // apply pandoc div.sourceCode style to pre.sourceCode instead (function() { var sheets = document.styleSheets; for (var i = 0; i < sheets.length; i++) { if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue; try { var rules = sheets[i].cssRules; } catch (e) { continue; } for (var j = 0; j < rules.length; j++) { var rule = rules[j]; // check if there is a div.sourceCode rule if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") continue; var style = rule.style.cssText; // check if color or background-color is set if (rule.style.color === '' && rule.style.backgroundColor === '') continue; // replace div.sourceCode by a pre.sourceCode rule sheets[i].deleteRule(j); sheets[i].insertRule('pre.sourceCode{' + style + '}', j); } } })(); </script> <style type="text/css">body { background-color: #fff; margin: 1em auto; max-width: 700px; overflow: visible; padding-left: 2em; padding-right: 2em; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; line-height: 1.35; } #TOC { clear: both; margin: 0 0 10px 10px; padding: 4px; width: 400px; border: 1px solid #CCCCCC; border-radius: 5px; background-color: #f6f6f6; font-size: 13px; line-height: 1.3; } #TOC .toctitle { font-weight: bold; font-size: 15px; margin-left: 5px; } #TOC ul { padding-left: 40px; margin-left: -1.5em; margin-top: 5px; margin-bottom: 5px; } #TOC ul ul { margin-left: -2em; } #TOC li { line-height: 16px; } table { margin: 1em auto; border-width: 1px; border-color: #DDDDDD; border-style: outset; border-collapse: collapse; } table th { border-width: 2px; padding: 5px; border-style: inset; } table td { border-width: 1px; border-style: inset; line-height: 18px; padding: 5px 5px; } table, table th, table td { border-left-style: none; border-right-style: none; } table thead, table tr.even { background-color: #f7f7f7; } p { margin: 0.5em 0; } blockquote { background-color: #f6f6f6; padding: 0.25em 0.75em; } hr { border-style: solid; border: none; border-top: 1px solid #777; margin: 28px 0; } dl { margin-left: 0; } dl dd { margin-bottom: 13px; margin-left: 13px; } dl dt { font-weight: bold; } ul { margin-top: 0; } ul li { list-style: circle outside; } ul ul { margin-bottom: 0; } pre, code { background-color: #f7f7f7; border-radius: 3px; color: #333; white-space: pre-wrap; } pre { border-radius: 3px; margin: 5px 0px 10px 0px; padding: 10px; } pre:not([class]) { background-color: #f7f7f7; } code { font-family: Consolas, Monaco, 'Courier New', monospace; font-size: 85%; } p > code, li > code { padding: 2px 0px; } div.figure { text-align: center; } img { background-color: #FFFFFF; padding: 2px; border: 1px solid #DDDDDD; border-radius: 3px; border: 1px solid #CCCCCC; margin: 0 5px; } h1 { margin-top: 0; font-size: 35px; line-height: 40px; } h2 { border-bottom: 4px solid #f7f7f7; padding-top: 10px; padding-bottom: 2px; font-size: 145%; } h3 { border-bottom: 2px solid #f7f7f7; padding-top: 10px; font-size: 120%; } h4 { border-bottom: 1px solid #f7f7f7; margin-left: 8px; font-size: 105%; } h5, h6 { border-bottom: 1px solid #ccc; font-size: 105%; } a { color: #0033dd; text-decoration: none; } a:hover { color: #6666ff; } a:visited { color: #800080; } a:visited:hover { color: #BB00BB; } a[href^="http:"] { text-decoration: underline; } a[href^="https:"] { text-decoration: underline; } code > span.kw { color: #555; font-weight: bold; } code > span.dt { color: #902000; } code > span.dv { color: #40a070; } code > span.bn { color: #d14; } code > span.fl { color: #d14; } code > span.ch { color: #d14; } code > span.st { color: #d14; } code > span.co { color: #888888; font-style: italic; } code > span.ot { color: #007020; } code > span.al { color: #ff0000; font-weight: bold; } code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61717; background-color: #e3d2d2; } </style> </head> <body> <h1 class="title toc-ignore">Using Skimr</h1> <h4 class="date">2022-12-23</h4> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p><code>skimr</code> is designed to provide summary statistics about variables in data frames, tibbles, data tables and vectors. It is opinionated in its defaults, but easy to modify.</p> <p>In base R, the most similar functions are <code>summary()</code> for vectors and data frames and <code>fivenum()</code> for numeric vectors:</p> <div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(iris)</span></code></pre></div> <pre><code>## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 ## Median :5.800 Median :3.000 Median :4.350 Median :1.300 ## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 ## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 ## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ## </code></pre> <div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(iris<span class="sc">$</span>Sepal.Length)</span></code></pre></div> <pre><code>## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 4.300 5.100 5.800 5.843 6.400 7.900</code></pre> <div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">fivenum</span>(iris<span class="sc">$</span>Sepal.Length)</span></code></pre></div> <pre><code>## [1] 4.3 5.1 5.8 6.4 7.9</code></pre> <div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(iris<span class="sc">$</span>Species)</span></code></pre></div> <pre><code>## setosa versicolor virginica ## 50 50 50</code></pre> </div> <div id="the-skim-function" class="section level2"> <h2>The <code>skim()</code> function</h2> <p>The core function of <code>skimr</code> is <code>skim()</code>, which is designed to work with (grouped) data frames, and will try coerce other objects to data frames if possible. Like <code>summary()</code>, <code>skim()</code>’s method for data frames presents results for every column; the statistics it provides depend on the class of the variable.</p> <div id="skimming-data-frames" class="section level3"> <h3>Skimming data frames</h3> <p>By design, the main focus of <code>skimr</code> is on data frames; it is intended to fit well within a data <a href="https://r4ds.had.co.nz/pipes.html">pipeline</a> and relies extensively on <a href="https://www.tidyverse.org/">tidyverse</a> vocabulary, which focuses on data frames.</p> <p>Results of <code>skim()</code> are <em>printed</em> horizontally, with one section per variable type and one row per variable.</p> <div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(skimr)</span> <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate ordered n_unique ## 1 Species 0 1 FALSE 3 ## top_counts ## 1 set: 50, ver: 50, vir: 50 ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂ ## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁ ## 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂ ## 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃</code></pre> <p>The format of the results are a single wide data frame combining the results, with some additional attributes and two metadata columns:</p> <ul> <li><code>skim_variable</code>: name of the original variable</li> <li><code>skim_type</code>: class of the variable</li> </ul> <p>Unlike many other objects within <code>R</code>, these columns are intrinsic to the <code>skim_df</code> class. Dropping these variables will result in a coercion to a <code>tibble</code>. The <code>is_skim_df()</code> function is used to assert that an object is a skim_df.</p> <div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span> <span class="fu">is_skim_df</span>()</span></code></pre></div> <pre><code>## [1] TRUE ## attr(,&quot;message&quot;) ## character(0)</code></pre> <div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">select</span>(<span class="sc">-</span>skim_type, <span class="sc">-</span>skim_variable) <span class="sc">%&gt;%</span> <span class="fu">is_skim_df</span>()</span></code></pre></div> <pre><code>## [1] FALSE ## attr(,&quot;message&quot;) ## [1] &quot;Object is not a `skim_df`: missing column `skim_type`; missing column `skim_variable`&quot;</code></pre> <div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">select</span>(<span class="sc">-</span>n_missing) <span class="sc">%&gt;%</span> <span class="fu">is_skim_df</span>()</span></code></pre></div> <pre><code>## [1] TRUE ## attr(,&quot;message&quot;) ## character(0)</code></pre> <p>In order to avoid type coercion, columns for summary statistics for different types are prefixed with the corresponding <code>skim_type</code>. This means that the columns of the <code>skim_df</code> are somewhat sparse, with quite a few missing values. This is because for some statistics the representations for different types of variables is different. For example, the mean of a Date variable and of a numeric variable are represented differently when printing, but this cannot be supported in a single vector. The exception to this are <code>n_missing</code> and <code>complete_rate</code> (missing/number of observations) which are the same for all types of variables.</p> <div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a> tibble<span class="sc">::</span><span class="fu">as_tibble</span>()</span></code></pre></div> <pre><code>## # A tibble: 5 × 15 ## skim_type skim_variable n_missing complete_rate factor.ordered factor.n_unique ## &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;int&gt; ## 1 factor Species 0 1 FALSE 3 ## 2 numeric Sepal.Length 0 1 NA NA ## 3 numeric Sepal.Width 0 1 NA NA ## 4 numeric Petal.Length 0 1 NA NA ## 5 numeric Petal.Width 0 1 NA NA ## # … with 9 more variables: factor.top_counts &lt;chr&gt;, numeric.mean &lt;dbl&gt;, ## # numeric.sd &lt;dbl&gt;, numeric.p0 &lt;dbl&gt;, numeric.p25 &lt;dbl&gt;, numeric.p50 &lt;dbl&gt;, ## # numeric.p75 &lt;dbl&gt;, numeric.p100 &lt;dbl&gt;, numeric.hist &lt;chr&gt;</code></pre> <p>This is in contrast to <code>summary.data.frame()</code>, which stores statistics in a <code>table</code>. The distinction is important, because the <code>skim_df</code> object is pipeable and easy to use for additional manipulation: for example, the user could select all of the variable means, or all summary statistics for a specific variable.</p> <div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">filter</span>(skim_variable <span class="sc">==</span> <span class="st">&quot;Petal.Length&quot;</span>)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## numeric 1 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂</code></pre> <p>Most <code>dplyr</code> verbs should work as expected.</p> <div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">select</span>(skim_type, skim_variable, n_missing)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing ## 1 Species 0 ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing ## 1 Sepal.Length 0 ## 2 Sepal.Width 0 ## 3 Petal.Length 0 ## 4 Petal.Width 0</code></pre> <p>The base skimmers <code>n_missing</code> and <code>complete_rate</code> are computed for all of the columns in the data. But all other type-based skimmers have a namespace. You need to use a <code>skim_type</code> prefix to refer to correct column.</p> <div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris) <span class="sc">%&gt;%</span></span> <span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">select</span>(skim_type, skim_variable, numeric.mean)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable mean ## 1 Sepal.Length 5.84 ## 2 Sepal.Width 3.06 ## 3 Petal.Length 3.76 ## 4 Petal.Width 1.20</code></pre> <p><code>skim()</code> also supports grouped data created by <code>dplyr::group_by()</code>. In this case, one additional column for each grouping variable is added to the <code>skim_df</code> object.</p> <div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a>iris <span class="sc">%&gt;%</span></span> <span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">group_by</span>(Species) <span class="sc">%&gt;%</span></span> <span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>()</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name Piped data ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## numeric 4 ## ________________________ ## Group variables Species ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable Species n_missing complete_rate mean sd p0 p25 p50 ## 1 Sepal.Length setosa 0 1 5.01 0.352 4.3 4.8 5 ## 2 Sepal.Length versicolor 0 1 5.94 0.516 4.9 5.6 5.9 ## 3 Sepal.Length virginica 0 1 6.59 0.636 4.9 6.22 6.5 ## 4 Sepal.Width setosa 0 1 3.43 0.379 2.3 3.2 3.4 ## 5 Sepal.Width versicolor 0 1 2.77 0.314 2 2.52 2.8 ## 6 Sepal.Width virginica 0 1 2.97 0.322 2.2 2.8 3 ## 7 Petal.Length setosa 0 1 1.46 0.174 1 1.4 1.5 ## 8 Petal.Length versicolor 0 1 4.26 0.470 3 4 4.35 ## 9 Petal.Length virginica 0 1 5.55 0.552 4.5 5.1 5.55 ## 10 Petal.Width setosa 0 1 0.246 0.105 0.1 0.2 0.2 ## 11 Petal.Width versicolor 0 1 1.33 0.198 1 1.2 1.3 ## 12 Petal.Width virginica 0 1 2.03 0.275 1.4 1.8 2 ## p75 p100 hist ## 1 5.2 5.8 ▃▃▇▅▁ ## 2 6.3 7 ▂▇▆▃▃ ## 3 6.9 7.9 ▁▃▇▃▂ ## 4 3.68 4.4 ▁▃▇▅▂ ## 5 3 3.4 ▁▅▆▇▂ ## 6 3.18 3.8 ▂▆▇▅▁ ## 7 1.58 1.9 ▁▃▇▃▁ ## 8 4.6 5.1 ▂▂▇▇▆ ## 9 5.88 6.9 ▃▇▇▃▂ ## 10 0.3 0.6 ▇▂▂▁▁ ## 11 1.5 1.8 ▅▇▃▆▁ ## 12 2.3 2.5 ▂▇▆▅▇</code></pre> <p>Individual columns from a data frame may be selected using tidyverse-style selectors.</p> <div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris, Sepal.Length, Species)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 1 ## ________________________ ## Group variables None ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate ordered n_unique ## 1 Species 0 1 FALSE 3 ## top_counts ## 1 set: 50, ver: 50, vir: 50 ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂</code></pre> <p>Or with common <code>select</code> helpers.</p> <div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(iris, <span class="fu">starts_with</span>(<span class="st">&quot;Sepal&quot;</span>))</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name iris ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## numeric 2 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂ ## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁</code></pre> <p>If an individual column is of an unsupported class, it is treated as a character variable with a warning.</p> </div> </div> <div id="skimming-vectors" class="section level2"> <h2>Skimming vectors</h2> <p>In <code>skimr</code> v2, <code>skim()</code> will attempt to coerce non-data frames (such as vectors and matrices) to data frames. In most cases with vectors, the object being evaluated should be equivalent to wrapping the object in <code>as.data.frame()</code>.</p> <p>For example, the <code>lynx</code> data set is class <code>ts</code>.</p> <div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(lynx)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name lynx ## Number of rows 114 ## Number of columns 1 ## _______________________ ## Column type frequency: ## ts 1 ## ________________________ ## Group variables None ## ## ── Variable type: ts ─────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate start end frequency deltat mean sd ## 1 x 0 1 1821 1934 1 1 1538. 1586. ## min max median line_graph ## 1 39 6991 771 ⡈⢄⡠⢁⣀⠒⣀⠔</code></pre> <p>Which is the same as coercing to a data frame.</p> <div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="fu">all.equal</span>(<span class="fu">skim</span>(lynx), <span class="fu">skim</span>(<span class="fu">as.data.frame</span>(lynx)))</span></code></pre></div> <pre><code>## [1] &quot;Attributes: &lt; Component \&quot;df_name\&quot;: 1 string mismatch &gt;&quot;</code></pre> </div> <div id="skimming-matrices" class="section level2"> <h2>Skimming matrices</h2> <p><code>skimr</code> does not support skimming matrices directly but coerces them to data frames. Columns in the matrix become variables. This behavior is similar to <code>summary.matrix()</code>). Three possible ways to handle matrices with <code>skim()</code> parallel the three variations of the mean function for matrices.</p> <div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a>m <span class="ot">&lt;-</span> <span class="fu">matrix</span>(<span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>, <span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>, <span class="dv">7</span>, <span class="dv">8</span>, <span class="dv">9</span>, <span class="dv">10</span>, <span class="dv">11</span>, <span class="dv">12</span>), <span class="at">nrow =</span> <span class="dv">4</span>, <span class="at">ncol =</span> <span class="dv">3</span>)</span> <span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a>m</span></code></pre></div> <pre><code>## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12</code></pre> <p>Skimming the matrix produces similar results to <code>colMeans()</code>.</p> <div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a><span class="fu">colMeans</span>(m)</span></code></pre></div> <pre><code>## [1] 2.5 6.5 10.5</code></pre> <div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(m) <span class="co"># Similar to summary.matrix and colMeans()</span></span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name m ## Number of rows 4 ## Number of columns 3 ## _______________________ ## Column type frequency: ## numeric 3 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 V1 0 1 2.5 1.29 1 1.75 2.5 3.25 4 ▇▇▁▇▇ ## 2 V2 0 1 6.5 1.29 5 5.75 6.5 7.25 8 ▇▇▁▇▇ ## 3 V3 0 1 10.5 1.29 9 9.75 10.5 11.2 12 ▇▇▁▇▇</code></pre> <p>Skimming the transpose of the matrix will give row-wise results.</p> <div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="fu">rowMeans</span>(m)</span></code></pre></div> <pre><code>## [1] 5 6 7 8</code></pre> <div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(<span class="fu">t</span>(m))</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name t(m) ## Number of rows 3 ## Number of columns 4 ## _______________________ ## Column type frequency: ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 V1 0 1 5 4 1 3 5 7 9 ▇▁▇▁▇ ## 2 V2 0 1 6 4 2 4 6 8 10 ▇▁▇▁▇ ## 3 V3 0 1 7 4 3 5 7 9 11 ▇▁▇▁▇ ## 4 V4 0 1 8 4 4 6 8 10 12 ▇▁▇▁▇</code></pre> <p>And call <code>c()</code> on the matrix to get results across all columns.</p> <div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(<span class="fu">c</span>(m))</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name c(m) ## Number of rows 12 ## Number of columns 1 ## _______________________ ## Column type frequency: ## numeric 1 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 data 0 1 6.5 3.61 1 3.75 6.5 9.25 12 ▇▅▅▅▇</code></pre> <div class="sourceCode" id="cb47"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="fu">mean</span>(m)</span></code></pre></div> <pre><code>## [1] 6.5</code></pre> <div id="skimming-without-modification" class="section level3"> <h3>Skimming without modification</h3> <p><code>skim_tee()</code> produces the same printed version as <code>skim()</code> but returns the original, unmodified data frame. This allows for continued piping of the original data.</p> <div class="sourceCode" id="cb49"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a>iris_setosa <span class="ot">&lt;-</span> iris <span class="sc">%&gt;%</span></span> <span id="cb49-2"><a href="#cb49-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim_tee</span>() <span class="sc">%&gt;%</span></span> <span id="cb49-3"><a href="#cb49-3" aria-hidden="true" tabindex="-1"></a> dplyr<span class="sc">::</span><span class="fu">filter</span>(Species <span class="sc">==</span> <span class="st">&quot;setosa&quot;</span>)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name data ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate ordered n_unique ## 1 Species 0 1 FALSE 3 ## top_counts ## 1 set: 50, ver: 50, vir: 50 ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂ ## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁ ## 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂ ## 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃</code></pre> <div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb51-1"><a href="#cb51-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(iris_setosa)</span></code></pre></div> <pre><code>## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa</code></pre> <p>Note, that <code>skim_tee()</code> is customized differently than <code>skim</code> itself. See below for more details.</p> </div> </div> <div id="reshaping-the-results-from-skim" class="section level2"> <h2>Reshaping the results from <code>skim()</code></h2> <p>As noted above, <code>skim()</code> returns a wide data frame. This is usually the most sensible format for the majority of operations when investigating data, but the package has some other functions to help with edge cases.</p> <p>First, <code>partition()</code> returns a named list of the wide data frames for each data type. Unlike the original data the partitioned data only has columns corresponding to the skimming functions used for this data type. These data frames are, therefore, not <code>skim_df</code> objects.</p> <div class="sourceCode" id="cb53"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a>iris <span class="sc">%&gt;%</span></span> <span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>() <span class="sc">%&gt;%</span></span> <span id="cb53-3"><a href="#cb53-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">partition</span>()</span></code></pre></div> <pre><code>## $factor ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 Species 0 1 FALSE 3 set: 50, ver: 50, vir:… ## ## $numeric ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂ ## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁ ## 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂ ## 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃</code></pre> <p>Alternatively, <code>yank()</code> selects only the subtable for a specific type. Think of it like <code>dplyr::select</code> on column types in the original data. Again, unsuitable columns are dropped.</p> <div class="sourceCode" id="cb55"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a>iris <span class="sc">%&gt;%</span></span> <span id="cb55-2"><a href="#cb55-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>() <span class="sc">%&gt;%</span></span> <span id="cb55-3"><a href="#cb55-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">yank</span>(<span class="st">&quot;numeric&quot;</span>)</span></code></pre></div> <pre><code>## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂ ## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁ ## 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂ ## 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃</code></pre> <p><code>to_long()</code> returns a single long data frame with columns <code>variable</code>, <code>type</code>, <code>statistic</code> and <code>formatted</code>. This is similar but not identical to the <code>skim_df</code> object in <code>skimr</code> v1.</p> <div class="sourceCode" id="cb57"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb57-1"><a href="#cb57-1" aria-hidden="true" tabindex="-1"></a>iris <span class="sc">%&gt;%</span></span> <span id="cb57-2"><a href="#cb57-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>() <span class="sc">%&gt;%</span></span> <span id="cb57-3"><a href="#cb57-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">to_long</span>() <span class="sc">%&gt;%</span> </span> <span id="cb57-4"><a href="#cb57-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">head</span>()</span></code></pre></div> <pre><code>## # A tibble: 6 × 4 ## skim_type skim_variable stat formatted ## &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; ## 1 factor Species n_missing 0 ## 2 numeric Sepal.Length n_missing 0 ## 3 numeric Sepal.Width n_missing 0 ## 4 numeric Petal.Length n_missing 0 ## 5 numeric Petal.Width n_missing 0 ## 6 factor Species complete_rate 1</code></pre> <p>Since the <code>skim_variable</code> and <code>skim_type</code> columns are a core component of the <code>skim_df</code> class, it’s possible to get unwanted side effects when using <code>dplyr::select()</code>. Instead, use <code>focus()</code> to select columns of the skimmed results and keep them as a <code>skim_df</code>; it always keeps the metadata column.</p> <div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb59-1"><a href="#cb59-1" aria-hidden="true" tabindex="-1"></a>iris <span class="sc">%&gt;%</span></span> <span id="cb59-2"><a href="#cb59-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>() <span class="sc">%&gt;%</span></span> <span id="cb59-3"><a href="#cb59-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">focus</span>(n_missing, numeric.mean)</span></code></pre></div> <pre><code>## ── Data Summary ──────────────────────── ## Values ## Name Piped data ## Number of rows 150 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: factor ─────────────────────────────────────────────────────── ## skim_variable n_missing ## 1 Species 0 ## ## ── Variable type: numeric ────────────────────────────────────────────────────── ## skim_variable n_missing mean ## 1 Sepal.Length 0 5.84 ## 2 Sepal.Width 0 3.06 ## 3 Petal.Length 0 3.76 ## 4 Petal.Width 0 1.20</code></pre> </div> <div id="rendering-the-results-of-skim" class="section level2"> <h2>Rendering the results of <code>skim()</code></h2> <p>The <code>skim_df</code> object is a wide data frame. The display is created by default using <code>print.skim_df()</code>; users can specify additional options by explicitly calling <code>print([skim_df object], ...)</code>.</p> <p>For documents rendered by <code>knitr</code>, the package provides a custom <code>knit_print</code> method. To use it, the final line of your code chunk should have a <code>skim_df</code> object.</p> <div class="sourceCode" id="cb61"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb61-1"><a href="#cb61-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(Orange)</span></code></pre></div> <table> <caption>Data summary</caption> <tbody> <tr class="odd"> <td align="left">Name</td> <td align="left">Orange</td> </tr> <tr class="even"> <td align="left">Number of rows</td> <td align="left">35</td> </tr> <tr class="odd"> <td align="left">Number of columns</td> <td align="left">3</td> </tr> <tr class="even"> <td align="left">_______________________</td> <td align="left"></td> </tr> <tr class="odd"> <td align="left">Column type frequency:</td> <td align="left"></td> </tr> <tr class="even"> <td align="left">factor</td> <td align="left">1</td> </tr> <tr class="odd"> <td align="left">numeric</td> <td align="left">2</td> </tr> <tr class="even"> <td align="left">________________________</td> <td align="left"></td> </tr> <tr class="odd"> <td align="left">Group variables</td> <td align="left">None</td> </tr> </tbody> </table> <p><strong>Variable type: factor</strong></p> <table> <colgroup> <col width="17%" /> <col width="12%" /> <col width="17%" /> <col width="10%" /> <col width="11%" /> <col width="29%" /> </colgroup> <thead> <tr class="header"> <th align="left">skim_variable</th> <th align="right">n_missing</th> <th align="right">complete_rate</th> <th align="left">ordered</th> <th align="right">n_unique</th> <th align="left">top_counts</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">Tree</td> <td align="right">0</td> <td align="right">1</td> <td align="left">TRUE</td> <td align="right">5</td> <td align="left">3: 7, 1: 7, 5: 7, 2: 7</td> </tr> </tbody> </table> <p><strong>Variable type: numeric</strong></p> <table style="width:100%;"> <colgroup> <col width="16%" /> <col width="11%" /> <col width="16%" /> <col width="8%" /> <col width="8%" /> <col width="4%" /> <col width="7%" /> <col width="5%" /> <col width="8%" /> <col width="5%" /> <col width="7%" /> </colgroup> <thead> <tr class="header"> <th align="left">skim_variable</th> <th align="right">n_missing</th> <th align="right">complete_rate</th> <th align="right">mean</th> <th align="right">sd</th> <th align="right">p0</th> <th align="right">p25</th> <th align="right">p50</th> <th align="right">p75</th> <th align="right">p100</th> <th align="left">hist</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">age</td> <td align="right">0</td> <td align="right">1</td> <td align="right">922.14</td> <td align="right">491.86</td> <td align="right">118</td> <td align="right">484.0</td> <td align="right">1004</td> <td align="right">1372.0</td> <td align="right">1582</td> <td align="left">▃▇▁▇▇</td> </tr> <tr class="even"> <td align="left">circumference</td> <td align="right">0</td> <td align="right">1</td> <td align="right">115.86</td> <td align="right">57.49</td> <td align="right">30</td> <td align="right">65.5</td> <td align="right">115</td> <td align="right">161.5</td> <td align="right">214</td> <td align="left">▇▃▇▇▅</td> </tr> </tbody> </table> <p>The same type of rendering is available from reshaped <code>skim_df</code> objects, those generated by <code>partition()</code> and <code>yank()</code> in particular.</p> <div class="sourceCode" id="cb62"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb62-1"><a href="#cb62-1" aria-hidden="true" tabindex="-1"></a><span class="fu">skim</span>(Orange) <span class="sc">%&gt;%</span></span> <span id="cb62-2"><a href="#cb62-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">yank</span>(<span class="st">&quot;numeric&quot;</span>)</span></code></pre></div> <p><strong>Variable type: numeric</strong></p> <table style="width:100%;"> <colgroup> <col width="16%" /> <col width="11%" /> <col width="16%" /> <col width="8%" /> <col width="8%" /> <col width="4%" /> <col width="7%" /> <col width="5%" /> <col width="8%" /> <col width="5%" /> <col width="7%" /> </colgroup> <thead> <tr class="header"> <th align="left">skim_variable</th> <th align="right">n_missing</th> <th align="right">complete_rate</th> <th align="right">mean</th> <th align="right">sd</th> <th align="right">p0</th> <th align="right">p25</th> <th align="right">p50</th> <th align="right">p75</th> <th align="right">p100</th> <th align="left">hist</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">age</td> <td align="right">0</td> <td align="right">1</td> <td align="right">922.14</td> <td align="right">491.86</td> <td align="right">118</td> <td align="right">484.0</td> <td align="right">1004</td> <td align="right">1372.0</td> <td align="right">1582</td> <td align="left">▃▇▁▇▇</td> </tr> <tr class="even"> <td align="left">circumference</td> <td align="right">0</td> <td align="right">1</td> <td align="right">115.86</td> <td align="right">57.49</td> <td align="right">30</td> <td align="right">65.5</td> <td align="right">115</td> <td align="right">161.5</td> <td align="right">214</td> <td align="left">▇▃▇▇▅</td> </tr> </tbody> </table> </div> <div id="customizing-print-options" class="section level2"> <h2>Customizing print options</h2> <p>Although its not a common use case outside of writing vignettes about <code>skimr</code>, you can fall back to default printing methods by adding the chunk option <code>render = knitr::normal_print</code>.</p> <p>You can also disable the <code>skimr</code> summary by setting the chunk option <code>skimr_include_summary = FALSE</code>.</p> <p>You can change the number of digits shown in the columns of generated statistics by changing the <code>skimr_digits</code> chunk option.</p> </div> <div id="modifying-skim" class="section level2"> <h2>Modifying <code>skim()</code></h2> <p><code>skimr</code> is opinionated in its choice of defaults, but users can easily add, replace, or remove the statistics for a class. For interactive use, you can create your own skimming function with the <code>skim_with()</code> factory. <code>skimr</code> also has an API for extensions in other packages. Working with that is covered later.</p> <p>To add a statistic for a data type, create an <code>sfl()</code> (a <code>skimr</code> function list) for each class that you want to change:</p> <div class="sourceCode" id="cb63"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb63-1"><a href="#cb63-1" aria-hidden="true" tabindex="-1"></a>my_skim <span class="ot">&lt;-</span> <span class="fu">skim_with</span>(<span class="at">numeric =</span> <span class="fu">sfl</span>(<span class="at">new_mad =</span