Packaging Data; Publication and Maintenance

Advanced R

Heather Turner and Ella Kaye
Department of Statistics, University of Warwick

June 21, 2023

Overview

  • Packaging data
  • Publication
    • GitHub
    • R-Universe
    • CRAN
  • Promotion
  • Maintenance

Packaging data

Including data

There are 3 types of data we might want to include:

  • Exported data for the user to access: put in /data
  • Internal data for functions to access: put in /R/sysdata.rda
  • Raw data: put in /inst/extdata

Exported data

The data should be saved in /data as an .rda (or .RData) file.

usethis::use_data() will do this for you, as well as a few other necessary steps:

letter_indices <- data.frame(letter = letters, index = seq_along(letters))
usethis::use_data(letter_indices)
✔ Adding 'R' to Depends field in DESCRIPTION
✔ Creating 'data/'
✔ Setting LazyData to 'true' in 'DESCRIPTION'
✔ Saving 'letter_indices' to 'data/letter_indices.rda'
• Document your data (see 'https://r-pkgs.org/data.html')

Note

For larger datasets, you can try changing the compress argument to get the best compression.

Provenance

Often the data that you want to make accessible to the users is one you have created with an R script – either from scratch or from a raw data set.

It’s a good idea to put the R script and any corresponding raw data in /data-raw.

usethis::use_data_raw("dataname") will set this up:

  • Create /data-raw
  • Add /data-raw/dataname.R for you to add the code needed to create the data
  • Add ^data-raw$ to .Rbuildignore as it does not need to be included in the actual package.

You should add any raw data files (e.g. .csv files) to /data-raw.

Documenting Data

Datasets in /data are always exported, so must be documented.

To document a dataset, we must have an .R script in /R that contains a Roxygen block above the name of the dataset.

As with functions, you can choose how to arrange this, e.g. in one combined /R/data.R or in a separate R file for each dataset.

Example: letter_indices

#' Letters of the Roman Alphabet with Indices
#'
#' A dataset of lower-case letters of the Roman alphabet and their 
#' numeric index from a = 1 to z = 26.
#'
#' @format A data frame with 26 rows and 2 variables:
#' \describe{
#'   \item{letter}{The letter as a character string.}
#'   \item{index}{The corresponding numeric index.}
#' }
"letter_indices"

#' @ examples can be used here too.

Data source

For collected data, the (original) source should be documented with #' @source.

This should either be a url, e.g.

#' @source \url{http://www.diamondse.info/}

(alternatively \href{DiamondSearchEngine}{http://www.diamondse.info/}), or a reference, e.g.

#' @source Henderson and Velleman (1981), Building multiple  
#' #' regression models interactively. *Biometrics*, **37**, 391–411.

Internal data

Sometimes functions need access to reference data, e.g. constants or look-up tables, that don’t need to be shared with users.

These objects should be saved in a single R/sysdata.rda file.

This can be done with use_data(..., internal = TRUE), e.g.

x <- sample(1000)
usethis::use_data(x, mtcars, internal = TRUE)

The generating code and any raw data can be put in /data-raw.

As the objects are not exported, they don’t need to be documented.

Raw data

Sometimes you want to include raw data, to use in examples or vignettes.

These files can be any format and should be added directly into /inst/extdata.

When the package is installed, these files will be copied to the extdata directory and their path on your system can be found as follows:

system.file("extdata", "mtcars.csv", package = "readr")
[1] "/Users/u2175871/Library/R/arm64/4.3/library/readr/extdata/mtcars.csv"

Your turn

  1. Run usethis::use_data_raw("farm_animals").
  2. In the script data-raw/farm_animals.R write some code to create a small data frame with the names of farm animals and the sound they make.
  3. Run all the code (including the already-present call to usethis::use_data()) to create the data and save it in /data.
  4. Add an R/farm_animals.R script and add some roxygen comments to document the function.
  5. Run devtools::document() to create the documentation for the farm_animals data. Preview the documentation to check it.
  6. Commit all the changes to your repo.

Publication

GitHub

Your package is already on GitHub

Since your package is already on GitHub, any R user can install it with

remotes::install_github("USER/REPO")

If you want to tag it as a release, make sure there’s a NEWS.md file then run

usethis::use_version() # set the release version number

Check the NEWS.md file is up-to-date (use_version() will modify it) then

usethis::use_github_release()

This will bundle the source code as a zip and a tar.gz and make them available from the Releases section of the repo homepage.

R-Universe

R-Universe

With R-Universe, you can create a personal, CRAN-like repository.

You’re in control of what’s published in your R-Universe!

It is a good way to allow users to easily install packages without going through the rigour of the CRAN submission process.

Useful resources:

Installing a package from an R-universe

Binaries are built for Windows and MacOS, which a user can install using install.packages(), e.g.

# Install 'malt' from the 'lrioudurand' universe
install.packages('malt', repos = c(
  lrioudurand = 'https://lrioudurand.r-universe.dev',
  CRAN = 'https://cloud.r-project.org')  
)

Alternatively, you can first set options(repos) to enable favourite repositories by default:

options(repos = c(
  lrioudurand = 'https://lrioudurand.r-universe.dev',
  CRAN = 'https://cloud.r-project.org')
)
install.packages("malt")

Create your R-universe

Follow this rOpenSci guide: How to create your personal CRAN-like repository on R-universe.

In a nutshell:

  1. Create a repository called <username>.r-universe.dev on the GitHub account for username, e.g. https://github.com/maelle/maelle.r-universe.dev. The repository must contain a file called packages.json in the standard format, defining at least the package name and git url for the packages you want to include, e.g.

    [
        {
            "package": "malt",
            "url": "https://github.com/lrioudurand/malt"
        }
    ]
  2. Install the r-universe app on the GitHub account that you want to enable. Choose enable for all repositories when asked.

What happens next

  • After a few minutes, your source universe will appear on: https://github.com/r-universe/<username>

  • The universe automatically starts building the packages from your registry. Once finished, they will appear on https://<username>.r-universe.dev

  • The universe automatically syncs and builds your package git repos once per hour.

  • If you encounter any issues, the actions tab in your source universe may show what is going on, for example: https://github.com/r-universe/maelle/actions

CRAN

Why publish on CRAN?

  • Sign of quality

    • Code is ready to be used (not a beta version)
    • Basic standards: documented code, running examples, etc
    • Works with current version of R and other packages
    • Commitment of maintainer
  • Discoverability

  • Ease of installation

  • Bioconductor, rOpenSci: even higher standards, code review

It’s an involved process

  • Read the official Checklist for CRAN Submissions to check requirements beyond the automated checks.

  • Read the community-created Prepare for CRAN checklist.

  • Useful functions for additional checks:

    • goodpractice::gp
    • spelling::spell_check_package

usethis::use_release_issue()

This function will first ask you to select the release version (major, minor, patch) then create and open a to-do list as an issue in the package GitHub repo.

For a first submission, there are around 22 tasks to complete, split into sections, to follow (more-or-less) in order:

  • First release (one-time only)
  • Prepare for release
  • Submit to CRAN
  • Wait or CRAN (things to do after package has been accepted)

For more details on each, see the Releasing to CRAN chapter of the R Packages book.

Run “as CRAN” checks

CRAN policies state that you must run R CMD check --as-cran on the tarball to be uploaded with the current version of R-devel.

First make sure the package passes check locally:

devtools::check()

Then allow some extra checks:

devtools::check(remote = TRUE, manual = TRUE)

Then send to CRAN’s win-builder to check on R-devel

devtools::check_win_devel()

Further options: R-Hub (multiple platforms available, with different compilers) and Mac-builder (with M1)

devtools::check_rhub()
devtools::check_mac_release()

cran-comments.md

Write submission notes, generating the cran-comments.md file with

usethis::use_cran_comments()

 ## Test environments
 * local OS X install (R-release)
 * win-builder (R-release, R-devel) 

 ## R CMD check results

 0 errors | 0 warnings | 1 note

 * This is a new release.

There’s always one note for a new submission.

Submit to CRAN

devtools::release()

This asks you questions which you should carefully read and answer.

If your submission fails

Do not despair! It happens to everyone, even R-core members.

If it’s from the CRAN robot, just fix the problem & resubmit.

If it’s from a human, do not respond to the email and do not argue. Instead update cran-comments.md & resubmit.

For resubmission

 This is a resubmission. Compared to the last submission, I
 have:

 * First change.
 * Second change.
 * Third change.

 --

 ## Test environments
 * local OS X install, R 3.2.2
 * win-builder (devel and release)

 ## R CMD check results
 ...

Subsequent submissions to CRAN

Proceed as before. If you have reverse dependencies you need to also run R CMD check on them, and notify CRAN if you have deliberately broken them.

Fortunately the revdepcheck package makes this fairly easy

remotes::install_github("r-lib/revdepcheck")
usethis::use_revdep()
library(revdepcheck)
revdep_check()
revdep_report_cran()

Promotion

Promoting your package

Talks

  • Meetups: Warwick RUG, Coventry R-Ladies (or your local groups)
  • Conferences https://jumpingrivers.github.io/meetingsR/events.html
    • General: useR!, posit::conf, satRdays
    • Specific: R/Finance, BioC, Psychoco
    • Non R-specific: Royal Statistical Society (RSS), ???
  • Conferences provide greater exposure, particular to people working in relevant field(s).
  • Don’t forget to share your slides! (Conference/personal website, LinkedIn, RPubs, Slideshare)

Paper

  • A paper not only promotes your package but benefits from peer review
    • Paper can also overlap with vignette
  • Traditional journals:
    • Open Source Software: The R Journal, Journal of Statistical Software
    • Computing: Computational Statistics and Data Analysis, Journal of Computational and Graphical Statistics, SoftwareX
    • Science: Bioinformatics, PLOS ONE, Method in Ecology and Evolution
  • Alternative journals:
    • F1000research Bioconductor/R package gateway: publish, then open review
    • Journal Open Source Software: open code review, short descriptive paper

Maintenance

usethis::use_upkeep_issue()

This is a new function in usethis. Like usethis::use_release_issue(), it opens a GitHub issue with an (opinionated) to-do list of tasks that should be ticked off for your package (at least) once a year.

The tidyverse team think of this like ‘spring cleaning’ for packages.

Blog post: Package spring cleaning

Interacting with users

  • Bug reports/help requests
    • Can show where documentation/tests need improving
    • Help you find out who’s using your package and what for
    • Can give ideas for new features
    • Can lead to collaborations
  • Avoid using email, so that other people can benefit
    • GitHub issues
    • Stackoverflow questions

Interacting with developers

  • Write developer documentation – remember you can add non-vignette articles with usethis::use_article()

  • Add a code of conduct, e.g. Contributor Covenant

    usethis::use_code_of_conduct()
  • Add a CONTRIBUTING.md to your GitHub repository

    • Do you have a style guide?
    • Reminders to run check/tests/add NEWS item to pull requests
  • Use tags to highlight issues: the following are promoted by GitHub, e.g. help wanted, good first issue

  • Add topics to your GitHub repo so potential contributors can find it

Consider the longer-term

  • Work on new features and bug fixes for the next release
  • Buddy-up
    • Review each other’s code
    • Co-author each other’s packages
  • Take advantage of events e.g. Hacktoberfest, Closember
  • Start work on your next package!

Congratulations 🎉

You have written a package!

And reached the end of the R Packages workshop!

End matter

References

Wickham, H and Bryan, J, R Packages (2nd edn, in progress), https://r-pkgs.org.

R Core Team, Writing R Extensions, https://cran.r-project.org/doc/manuals/r-release/R-exts.html

rOpenSci Packages: Development, Maintenance, and Peer Review https://devguide.ropensci.org/index.html

rOpenSci Statistical Software Peer Review (especially Chapter 3: Guide for Authors) https://stats-devguide.ropensci.org/pkgdev.html

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).