skimr

# vroom 1.6.4 * It is now possible (again?) to read from a list of connections (@bairdj, #514). * Internal change for compatibility with cpp11 >= 0.4.6 (@DavisVaughan, #512). # vroom 1.6.3 * No user-facing changes. # vroom 1.6.2 * There was no CRAN release with this version number. # vroom 1.6.1 * `str()` now works in a colorized context in the presence of a column of class `integer64`, i.e. parsed with `col_big_integer()` (@bart1, #477). * The embedded implementation of the Grisu algorithm for printing floating point numbers now uses `snprintf()` instead of `sprintf()` and likewise for vroom's own code (@jeroen, #480). # vroom 1.6.0 * `vroom(col_select=)` now handles column selection by numeric position when `id` column is provided (#455). * `vroom(id = "path", col_select = a:c)` is treated like `vroom(id = "path", col_select = c(path, a:c))`. If an `id` column is provided, it is automatically included in the output (#416). * `vroom_write(append = TRUE)` does not modify an existing file when appending an empty data frame. In particular, it does not overwrite (delete) the existing contents of that file (https://github.com/tidyverse/readr/issues/1408, #451). * `vroom::problems()` now defaults to `.Last.value` for its primary input, similar to how `readr::problems()` works (#443). * The warning that indicates the existence of parsing problems has been improved, which should make it easier for the user to follow-up (https://github.com/tidyverse/readr/issues/1322). * `vroom()` reads more reliably from filepaths containing non-ascii characters, in a non-UTF-8 locale (#394, #438). * `vroom_format()` and `vroom_write()` only quote values that contain a delimiter, quote, or newline. Specifically values that are equal to the `na` string (or that start with it) are no longer quoted (#426). * Fixed segfault when reading in multiple files and the first file has only a header row of column names, but subsequent files have at least one row (#430). * Fixed segfault when `vroom_format()` is given an empty data frame (#425) * Fixed a segfault that could occur when the final field of the final line is missing and the file also does not end in a newline (#429). * Fixed recursive garbage collection error that could occur during `vroom_write()` when `output_column()` generates an ALTREP vector (#389). * `vroom_progress()` uses `rlang::is_interactive()` instead of `base::interactive()`. * `col_factor(levels = NULL)` honors the `na` strings of `vroom()` and its own `include_na` argument, as described in the docs, and now reproduces the behaviour of readr's first edition parser (#396). # vroom 1.5.7 * Jenny Bryan is now the official maintainer. * Fix uninitialized bool detected by CRAN's UBSAN check (https://github.com/tidyverse/vroom/pull/386) * Fix buffer overflow when trying to parse an integer field that is over 64 characters long (https://github.com/tidyverse/readr/issues/1326) * Fix subset indexing when indexes span a file boundary multiple times (#383) # vroom 1.5.6 * `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381) * `vroom(n_max=)` now correctly handles cases when reading from a connection and the file does _not_ end with a newline (https://github.com/tidyverse/readr/issues/1321) * `vroom()` no longer issues a spurious warning when the parsing needs to be restarted due to the presence of embedded newlines (https://github.com/tidyverse/readr/issues/1313) * Fix performance issue when materializing subsetted vectors (#378) * `vroom_format()` now uses the same internal multi-threaded code as `vroom_write()`, improving its performance in most cases (#377) * `vroom_fwf()` no longer omits the last line if it does _not_ end with a newline (https://github.com/tidyverse/readr/issues/1293) * Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (https://github.com/tidyverse/readr/issues/1297) * Files with a header but no contents, or a empty file if `col_names = FALSE` no longer cause a hang when `progress = TRUE` (https://github.com/tidyverse/readr/issues/1297) * Commented lines with comments at the end of lines no longer hang R (https://github.com/tidyverse/readr/issues/1309) * Comment lines containing unpaired quotes are no longer treated as unterminated quotations (https://github.com/tidyverse/readr/issues/1307) * Values with only a `Inf` or `NaN` prefix but additional data afterwards, like `Inform` or no longer inappropriately guessed as doubles (https://github.com/tidyverse/readr/issues/1319) * Time types now support `%h` format to denote hour durations greater than 24, like readr (https://github.com/tidyverse/readr/issues/1312) * Fix performance issue when materializing subsetted vectors (#378) # vroom 1.5.5 * `vroom()` now supports files with only carriage return newlines (`\r`). (#360, https://github.com/tidyverse/readr/issues/1236) * `vroom()` now parses single digit datetimes more consistently as readr has done (https://github.com/tidyverse/readr/issues/1276) * `vroom()` now parses `Inf` values as doubles (https://github.com/tidyverse/readr/issues/1283) * `vroom()` now parses `NaN` values as doubles (https://github.com/tidyverse/readr/issues/1277) * `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports scientific notation (#364) * `vroom()` now works around specifying a `\n` as the delimiter (#365, https://github.com/tidyverse/dplyr/issues/5977) * `vroom()` no longer crashes if given a `col_name` and `col_type` both less than the number of columns (https://github.com/tidyverse/readr/issues/1271) * `vroom()` no longer hangs if given an empty value for `locale(grouping_mark=)` (https://github.com/tidyverse/readr/issues/1241) * Fix performance regression when guessing with large numbers of rows (https://github.com/tidyverse/readr/issues/1267) # vroom 1.5.4 * `vroom(col_types=)` now accepts column type names like those accepted by utils::read.table. e.g. vroom::vroom(col_types = list(a = "integer", b = "double", c = "skip")) * `vroom()` now respects the `quote` parameter properly in the first two lines of the file (https://github.com/tidyverse/readr/issues/1262) * `vroom_write()` now always correctly writes its output including column names in UTF-8 (https://github.com/tidyverse/readr/issues/1242) * `vroom_write()` now creates an empty file when given a input without any columns (https://github.com/tidyverse/readr/issues/1234) # vroom 1.5.3 * `vroom(col_types=)` now truncates the column types if the user passes too many types. (#355) * `vroom()` now always includes the last row when guessing (#352) * `vroom(trim_ws = TRUE)` now trims field content within quotes as well as without (#354). Previously vroom explicitly retained field content inside quotes regardless of the value of `trim_ws`. # vroom 1.5.2 * `vroom()` now supports inputs with unnamed column types that are less than the number of columns (#296) * `vroom()` now outputs the correct column names even in the presence of skipped columns (#293, [tidyverse/readr#1215](https://github.com/tidyverse/readr/issues/1215)) * `vroom_fwf(n_max=)` now works as intended when the input is a connection. * `vroom()` and `vroom_write()` now automatically detect the compression format regardless of the file extension for bzip2, xzip, gzip and zip files (#348) * `vroom()` and `vroom_write()` now automatically support many more archive formats thanks to the archive package. These include new support for writing zip files, reading and writing 7zip, tar and ISO files. * `vroom(num_threads = 1)` will now not spawn any threads. This can be used on as a workaround on systems without full thread support. * Threads are now automatically disabled on non-macOS systems compiling against clang's libc++. Most systems non-macOS systems use the more common gcc libstdc++, so this should not effect most users. # vroom 1.5.1 * Parsers now treat NA values as NA even if they are valid values for the types (#342) * Element-wise indexing into lazy (ALTREP) vectors now has much less overhead (#344) # vroom 1.5.0 ## Major improvements * New `vroom(show_col_types=)` argument to more simply control when column types are shown. * `vroom()`, `vroom_fwf()` and `vroom_lines()` now support multi-byte encodings such as UTF-16 and UTF-32 by converting these files to UTF-8 under the hood (#138) * `vroom()` now supports skipping comments and blank lines within data, not just at the start of the file (#294, #302) * `vroom()` now uses the tzdb package when parsing date-times (@DavisVaughan, #273) * `vroom()` now emits a warning of class `vroom_parse_issue` if there are non-fatal parsing issues. * `vroom()` now emits a warning of class `vroom_mismatched_column_name` if the user supplies a column type that does not match the name of a read column (#317). * The vroom package now uses the MIT license, as part of systematic relicensing throughout the r-lib and tidyverse packages (#323) ## Minor improvements and fixes * `vroom() correctly reads double values with comma as decimal separator (@kent37 #313) * `vroom()` now correctly skips lines with only one quote if the format doesn't use quoting (https://github.com/tidyverse/readr/issues/991#issuecomment-616378446) * `vroom()` and `vroom_lines()` now handle files with mixed windows and POSIX line endings (https://github.com/tidyverse/readr/issues/1210) * `vroom()` now outputs a tibble with the expected number of columns and types based on `col_types` and `col_names` even if the file is empty (#297). * `vroom()` no longer mis-indexes files read from connections with windows line endings when the two line endings falls on separate sides of the read buffer (#331) * `vroom()` no longer crashes if `n_max = 0` and `col_names` is a character (#316) * `vroom()` now preserves the spec attribute when vroom and readr are both loaded (#303) * `vroom()` now allows specifying column names in `col_types` that have been repaired (#311) * `vroom()` no longer inadvertently calls `.name_repair` functions twice (#310). * `vroom()` is now more robust to quoting issues when tracking the CSV state (#301) * `vroom()` now registers the S3 class with `methods::setOldClass()` (r-dbi/DBI#345) * `col_datetime()` now supports '%s' format, which represents decimal seconds since the Unix epoch. * `col_numeric()` now supports `grouping_mark` and `decimal_mark` that are unicode characters, such as U+00A0 which is commonly used as the grouping mark for numbers in France (https://github.com/tidyverse/readr/issues/796). * `vroom_fwf()` gains a `skip_empty_rows` argument to skip empty lines (https://github.com/tidyverse/readr/issues/1211) * `vroom_fwf()` now respects `n_max`, as intended (#334) * `vroom_lines()` gains a `na` argument. * `vroom_write_lines()` no longer escapes or quotes lines. * `vroom_write_lines()` now works as intended (#291). * `vroom_write(path=)` has been deprecated, in favor of `file`, to match readr. * `vroom_write_lines()` now exposes the `num_threads` argument. * `problems()` now prints the correct row number of parse errors (#326) * `problems()` now throws a more informative error if called on a readr object (#308). * `problems()` now de-duplicates identical problems (#318) * Fix an inadvertent performance regression when reading values (#309) * `n_max` argument is correctly respected in edge cases (#306) * factors with implicit levels now work when fields are quoted, as intended (#330) * Guessing double types no longer unconditionally ignores leading whitespace. Now whitespace is only ignored when `trim_ws` is set. # vroom 1.4.0 ## Major changes and new functions * vroom now tracks indexing and parsing errors like readr. The first time an issue is encountered a warning will be signaled. A tibble of all found problems can be retrieved with `vroom::problems()`. (#247) * Data with newlines within quoted fields will now automatically revert to using a single thread and be properly read (#282) * NUL values in character data are now permitted, with a warning. * New `vroom_write_lines()` function to write a character vector to a file (#291) * `vroom_write()` gains a `eol=` parameter to specify the end of line character(s) to use. Use `vroom_write(eol = "\r\n")` to write a file with Windows style newlines (#263). ## Minor improvements and fixes * Datetime formats used when guessing now match those used when parsing (#240) * Quotes are now only valid next to newlines or delimiters (#224) * `vroom()` now signals an R error for invalid date and datetime formats, instead of crashing the session (#220). * `vroom(comment = )` now accepts multi-character comments (#286) * `vroom_lines()` now works with empty files (#285) * Vectors are now subset properly when given invalid subscripts (#283) * `vroom_write()` now works when the delimiter is empty, e.g. `delim = ""` (#287). * `vroom_write()` now works with all ALTREP vectors, including string vectors (#270) * An internal call to `new.env()` now correctly uses the `parent` argument (#281) # vroom 1.3.2 * Test failures on R 4.1 related to factors with NA values fixed (#262) * `vroom()` now works without error with readr versions of col specs (#256, #264, #266) # vroom 1.3.1 * Test failures on R 4.1 related to POSIXct classes fixed (#260) * Column subsetting with double indexes now works again (#257) * `vroom(n_max=)` now only partially downloads files from connections, as intended (#259) # vroom 1.3.0 * The Rcpp dependency has been removed in favor of cpp11. * `vroom()` now handles cases when `id` is set and a column in skipped (#237) * `vroom()` now supports column selections when there are some empty column names (#238) * `vroom()` argument `n_max` now works properly for files with windows newlines and no final newline (#244) * Subsetting vectors now works with `View()` in RStudio if there are now rows to subset (#253). * Subsetting datetime columns now works with `NA` indices (#236). # vroom 1.2.1 * `vroom()` now writes the column names if given an input with no rows (#213) * `vroom()` columns now support indexing with NA values (#201) * `vroom()` no longer truncates the last value in a file if the file contains windows newlines but no final newline (#219). * `vroom()` now works when the `na` argument is encoded in non ASCII or UTF-8 locales _and_ the file encoding is not the same as the native encoding (#233). * `vroom_fwf()` now verifies that the positions are valid, namely that the begin value is always less than the previous end (#217). * `vroom_lines()` gains a `locale` argument so you can control the encoding of the file (#218) * `vroom_write()` now supports the `append` argument with R connections (#232) # vroom 1.2.0 ## Breaking changes * `vroom_altrep_opts()` and the argument `vroom(altrep_opts =)` have been renamed to `vroom_altrep()` and `altrep` respectively. The prior names have been deprecated. ## New Features * `vroom()` now supports reading Big Integer values with the `bit64` package. Use `col_big_integer()` or the "I" shortcut to read a column as big integers. (#198) * `cols()` gains a `.delim` argument and `vroom()` now uses it as the delimiter if it is provided (#192) * `vroom()` now supports reading from `stdin()` directly, interpreted as the C-level standard input (#106). ## Minor improvements and fixes * `col_date` now parses single digit month and day (@edzer, #123, #170) * `fwf_empty()` now uses the `skip` parameter, as intended. * `vroom()` can now read single line files without a terminal newline (#173). * `vroom()` can now select the id column if provided (#110). * `vroom()` now correctly copies string data for factor levels (#184) * `vroom()` no longer crashes when files have trailing fields, windows newlines and the file is not newline or null terminated. * `vroom()` now includes a spec object with the `col_types` class, as intended. * `vroom()` now better handles floating point values with very large exponents (#164). * `vroom()` now uses better heuristics to guess the delimiter and now throws an error if a delimiter cannot be guessed (#126, #141, #167). * `vroom()` now has an improved error message when a file does not exist (#169). * `vroom()` no longer leaks file handles (#177, #180) * `vroom()` now outputs its messages on `stdout()` rather than `stderr()`, which avoids the text being red in RStudio and in the Windows GUI. * `vroom()` no longer overflows when reading files with more than 2B entries (@wlattner, #183). * `vroom_fwf()` is now more robust if not all lines are the expected length (#78) * `vroom_fwf()` and `fwf_empty()` now support passing `Inf` to `guess_max()`. * `vroom_str()` now works with S4 objects. * `vroom_fwf()` now handles files with dos newlines properly. * `vroom_write()` now does not try to write anything when given empty inputs (#172). * Dates, times, and datetimes now properly consider the locale when parsing. * Added benchmarks with _wide_ data for both numeric and character data (#87, @R3myG) * The delimiter used for parsing is now shown in the message output (#95 @R3myG) # vroom 1.0.2 ## New Features * The column created by `id` is now stored as an run length encoded Altrep vector, which uses less memory and is much faster for large inputs. (#111) ## Minor improvements and fixes * `vroom_lines()` now properly respects the `n_max` parameter (#142) * `vroom()` and `vroom_lines()` now support reading files which do not end in newlines by using a file connection (#40). * `vroom_write()` now works with the standard output connection `stdout()` (#106). * `vroom_write()` no longer crashes non-deterministically when used on Altrep vectors. * The integer parser now returns NA values for invalid inputs (#135) * Fix additional UBSAN issue in the mio project reported by CRAN (#97) * Fix indexing into connections with quoted fields (#119) * Move example files for `vroom()` out of `\dontshow{}`. * Fix integer overflow with very large files (#116, #119) * Fix missing columns and windows newlines (#114) * Fix encoding of column names (#113, #115) * Throw an error message when writing a zip file, which is not supported (@metaOO, #145) * Default message output from `vroom()` now uses `Rows` and `Cols` (@meta00, #140) # vroom 1.0.1 ## New Features * `vroom_lines()` function added, to (lazily) read lines from a file into a character vector (#90). ## Minor improvements and fixes * Fix for a hang on Windows caused by a race condition in the progress bar (#98) * Remove accidental runtime dependency on testthat (#104) * Fix to actually return non-Altrep character columns on R 3.2, 3.3 and 3.4. * Disable colors in the progress bar when running in RStudio, to work around an issue where the progress bar would be garbled (https://github.com/rstudio/rstudio/issues/4777) * Fix for UBSAN issues reported by CRAN (#97) * Fix for rchk issues reported by CRAN (#94) * The progress bar now only updates every 10 milliseconds. * Getting started vignette index entry now more informative (#92) # vroom 1.0.0 * Initial release * Added a `NEWS.md` file to track changes to the package.