Advanced R
June 21, 2023
There are 3 types of data we might want to include:
/data
/R/sysdata.rda
/inst/extdata
The data should be saved in /data
as an .rda
(or .RData
) file.
usethis::use_data()
will do this for you, as well as a few other necessary steps:
✔ Adding 'R' to Depends field in DESCRIPTION
✔ Creating 'data/'
✔ Setting LazyData to 'true' in 'DESCRIPTION'
✔ Saving 'letter_indices' to 'data/letter_indices.rda'
• Document your data (see 'https://r-pkgs.org/data.html')
Note
For larger datasets, you can try changing the compress
argument to get the best compression.
Often the data that you want to make accessible to the users is one you have created with an R script – either from scratch or from a raw data set.
It’s a good idea to put the R script and any corresponding raw data in /data-raw
.
usethis::use_data_raw("dataname")
will set this up:
/data-raw
/data-raw/dataname.R
for you to add the code needed to create the data^data-raw$
to .Rbuildignore
as it does not need to be included in the actual package.You should add any raw data files (e.g. .csv
files) to /data-raw
.
Datasets in /data
are always exported, so must be documented.
To document a dataset, we must have an .R
script in /R
that contains a Roxygen block above the name of the dataset.
As with functions, you can choose how to arrange this, e.g. in one combined /R/data.R
or in a separate R file for each dataset.
#' Letters of the Roman Alphabet with Indices
#'
#' A dataset of lower-case letters of the Roman alphabet and their
#' numeric index from a = 1 to z = 26.
#'
#' @format A data frame with 26 rows and 2 variables:
#' \describe{
#' \item{letter}{The letter as a character string.}
#' \item{index}{The corresponding numeric index.}
#' }
"letter_indices"
#' @ examples
can be used here too.
For collected data, the (original) source should be documented with #' @source
.
This should either be a url, e.g.
(alternatively \href{DiamondSearchEngine}{http://www.diamondse.info/}
), or a reference, e.g.
Sometimes functions need access to reference data, e.g. constants or look-up tables, that don’t need to be shared with users.
These objects should be saved in a single R/sysdata.rda
file.
This can be done with use_data(..., internal = TRUE)
, e.g.
The generating code and any raw data can be put in /data-raw
.
As the objects are not exported, they don’t need to be documented.
Sometimes you want to include raw data, to use in examples or vignettes.
These files can be any format and should be added directly into /inst/extdata
.
When the package is installed, these files will be copied to the extdata
directory and their path on your system can be found as follows:
usethis::use_data_raw("farm_animals")
.data-raw/farm_animals.R
write some code to create a small data frame with the names of farm animals and the sound they make.usethis::use_data()
) to create the data and save it in /data
.R/farm_animals.R
script and add some roxygen comments to document the function.devtools::document()
to create the documentation for the farm_animals
data. Preview the documentation to check it.Since your package is already on GitHub, any R user can install it with
If you want to tag it as a release, make sure there’s a NEWS.md
file then run
Check the NEWS.md
file is up-to-date (use_version()
will modify it) then
usethis::use_github_release()
This will bundle the source code as a zip
and a tar.gz
and make them available from the Releases section of the repo homepage.
With R-Universe, you can create a personal, CRAN-like repository.
You’re in control of what’s published in your R-Universe!
It is a good way to allow users to easily install packages without going through the rigour of the CRAN submission process.
Useful resources:
Binaries are built for Windows and MacOS, which a user can install using install.packages()
, e.g.
Follow this rOpenSci guide: How to create your personal CRAN-like repository on R-universe.
In a nutshell:
Create a repository called <username>.r-universe.dev
on the GitHub account for username
, e.g. https://github.com/maelle/maelle.r-universe.dev. The repository must contain a file called packages.json in the standard format, defining at least the package
name and git url
for the packages you want to include, e.g.
[
{
"package": "malt",
"url": "https://github.com/lrioudurand/malt"
}
]
Install the r-universe app on the GitHub account that you want to enable. Choose enable for all repositories when asked.
After a few minutes, your source universe will appear on: https://github.com/r-universe/<username>
The universe automatically starts building the packages from your registry. Once finished, they will appear on https://<username>.r-universe.dev
The universe automatically syncs and builds your package git repos once per hour.
If you encounter any issues, the actions tab in your source universe may show what is going on, for example: https://github.com/r-universe/maelle/actions
Sign of quality
Discoverability
Ease of installation
Bioconductor, rOpenSci: even higher standards, code review
Read the official Checklist for CRAN Submissions to check requirements beyond the automated checks.
Read the community-created Prepare for CRAN checklist.
Useful functions for additional checks:
goodpractice::gp
spelling::spell_check_package
usethis::use_release_issue()
This function will first ask you to select the release version (major, minor, patch) then create and open a to-do list as an issue in the package GitHub repo.
For a first submission, there are around 22 tasks to complete, split into sections, to follow (more-or-less) in order:
For more details on each, see the Releasing to CRAN chapter of the R Packages book.
CRAN policies state that you must run R CMD check --as-cran
on the tarball to be uploaded with the current version of R-devel.
First make sure the package passes check locally:
Then allow some extra checks:
Then send to CRAN’s win-builder to check on R-devel
cran-comments.md
Write submission notes, generating the cran-comments.md
file with
## Test environments
* local OS X install (R-release)
* win-builder (R-release, R-devel)
## R CMD check results
0 errors | 0 warnings | 1 note
* This is a new release.
There’s always one note for a new submission.
This asks you questions which you should carefully read and answer.
Do not despair! It happens to everyone, even R-core members.
If it’s from the CRAN robot, just fix the problem & resubmit.
If it’s from a human, do not respond to the email and do not argue. Instead update cran-comments.md
& resubmit.
This is a resubmission. Compared to the last submission, I
have:
* First change.
* Second change.
* Third change.
--
## Test environments
* local OS X install, R 3.2.2
* win-builder (devel and release)
## R CMD check results
...
Proceed as before. If you have reverse dependencies you need to also run R CMD check
on them, and notify CRAN if you have deliberately broken them.
Fortunately the revdepcheck package makes this fairly easy
usethis::use_upkeep_issue()
This is a new function in usethis. Like usethis::use_release_issue()
, it opens a GitHub issue with an (opinionated) to-do list of tasks that should be ticked off for your package (at least) once a year.
The tidyverse team think of this like ‘spring cleaning’ for packages.
Blog post: Package spring cleaning
Write developer documentation – remember you can add non-vignette articles with usethis::use_article()
Add a code of conduct, e.g. Contributor Covenant
Add a CONTRIBUTING.md to your GitHub repository
Use tags to highlight issues: the following are promoted by GitHub, e.g. help wanted
, good first issue
Add topics to your GitHub repo so potential contributors can find it
You have written a package!
And reached the end of the R Packages workshop!
Wickham, H and Bryan, J, R Packages (2nd edn, in progress), https://r-pkgs.org.
R Core Team, Writing R Extensions, https://cran.r-project.org/doc/manuals/r-release/R-exts.html
rOpenSci Packages: Development, Maintenance, and Peer Review https://devguide.ropensci.org/index.html
rOpenSci Statistical Software Peer Review (especially Chapter 3: Guide for Authors) https://stats-devguide.ropensci.org/pkgdev.html
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).