Advanced R Course
January 23, 2023
Functions are defined by three components:
( )
{ }
They are created using function()
As with arguments, function names are important:
use a name that describes what it returns (e.g. t_statistic
) or what it does (e.g. remove_na
)
try to use one convention for combining words (e.g. snake case t_statistic
or camel case tStatistic
)
avoid using the same name as other functions
specified arguments are those named in the function definition, e.g. in rnorm()
the arguments are n
, mean
and sd
.
mean
and sd
have been given default values in the function definition, but n
has not, so the function fails if the user does not pass a value to n
The user can pass objects to these arguments using their names or by supplying unnamed values in the right order
[1] 5.108557 21.023061 2.147342 7.937678 -3.102047
[1] -5.846403 1.878586 3.888573 5.845032 20.039860
So naming and order is important! Some guidelines
Arguments are used as objects in the function code.
An new environment is created each time the function is called, separate from the global workspace.
By default, functions return the object created by the last line of code
Alternatively return()
can be used to terminate the function and return a given object
Multiple objects can be returned in a list:
RStudio has a helper to turn code into a function:
Code
” > “Extract Function
” from the menu.https://github.com/Warwick-Stats-Resources/Advanced-R-exercises
You can get the files by creating a new project from version control in RStudio (if set up)
By going to the ‘Code’ button in the repo, then ‘Download ZIP’, then opening Advanced-R-exercises.Rproj.
In the qq_norm
chunk of exercises.Rmd
there is some code to compute the slope and intercept of the line to add to a quantile-quantile plot, comparing sample quantiles against theoretical quantiles of a N(0, 1) distribution.
Turn this code into a function named qq_norm
taking the sample data as an argument and returning the slope and intercept in a list.
Run this chunk to source the function, then run the normal-QQ
chunk which uses the qq_norm
function to compute parameters for an example plot.
A new environment is created each time the function is called, separate from the global workspace.
If an object is not defined within the function, or passed in as an argument, R looks for it in the parent environment where the function was defined
It is safest (and best practice) to use arguments rather than depend on global variables!
...
or the ellipsis allow unspecified arguments to be passed to the function.
This device is used by functions that work with arbitrary numbers of objects, e.g.
It can also be used to pass on arguments to another function, e.g.
...
Arguments passed to ...
can be collected into a list for further analysis
means <- function(...){
dots <- list(...)
vapply(dots, mean, numeric(1), na.rm = TRUE)
}
x <- 1
y <- 2:3
means(x, y)
[1] 1.0 2.5
Similarly the objects could be concatenated using c()
A side-effect is a change outside the function that occurs when the function is run, e.g.
A function can have many side-effects and a return value, but it is best practice to have a separate function for each task, e.g creating a plot or a table.
Writing to file is usually best done outside a function.
Copy your qq_norm
function to the qq
chunk and rename it qq
.
Add a new argument fun
to specify any quantile function (e.g. qt
, qf
, etc). Give it the default value qnorm
.
Inside the function use qfun <- match.fun(fun)
to get the quantile function matching fun
, then use qfun
instead of qnorm
to compute q_theory
. Use ...
to pass on arguments to qfun
.
Run the qq
chunk and test your function on the t-QQ
chunk.
In our own functions (outside of packages), it is possible to use library
But this loads the entire package, potentially leading to clashes with functions from other packages. It is better to use the import package:
ggplot2, like dplyr and other tidyverse packages, uses non-standard evaluation, that is, it refers to variable names in a data frame as if they were objects in the current environment
To emulate this, we have to need to embrace arguments
It is a good idea to separate function code from analysis code.
Put related functions together and source as required
The import package enables only necessary, top-level functions to be imported to the global workspace:
In either case, import::from
commands can be put outside the function body to make the code easier to read.
To avoid mistakes, you may want to add some basic sanity checks
Often the R messages can be quite obscure
Error in if (max(x) < 1e+07) 0 else x: missing value where TRUE/FALSE needed
More helpful error message can be implemented using stop
Warning messages should be given using warning()
safe_log2 <- function(x) {
if (any(x == 0)) {
x[x == 0] <- 0.1
warning("zeros replaced by 0.1")
}
log(x, 2)
}
safe_log2(0:1)
[1] -3.321928 0.000000
Other messages can be printed using message()
.
If a warning is expected, you may wish to suppress it
All warnings will be suppressed however!
Similarly suppressMessages()
will suppress messages.
The purrr package has various functions to catch issues.
possibly()
lets you modify a function to return a specified value when there is an error
Error in log("a"): non-numeric argument to mathematical function
[1] NA
safely()
works in a similar way but returns a list with elements "result"
and "error"
, so you can record the error message(s).
quietly()
lets you modify a function to return printed output, warnings and messages along with the result.
traceback()
When an unexpected error occurs, there are several ways to track down the source of the error, e.g. traceback()
Error in f2(x): object 'qqqq' not found
2: f2(2) at #1
1: f1(10)
traceback()
in RStudioIn RStudio, if Debug > On Error > Error Inspector
is checked and the traceback has at least 3 calls, the option to show traceback is presented
debugonce()
debugonce()
flags a function for debugging the next time it is called
debugging in: f2(2)
debug at #1: {
x + qqqq
}
Browse[2]> ls()
[1] "x"
Browse[2]> Q
When in debug mode type n
or ↵ to step to the next line and c
to continue to the end of a loop or the end of the function.
Stepping through a function line by line can be tedious. In RStudio we can set custom breakpoints in the source pane
Set breakpoint in RStudio
Source the code
:::{.notes} n
is automatically printed, so the first prompt is at the breakpoint :::
The Rerun with Debug
option will rerun the command that created the error and enter debug mode where the error occurred.
Good points:
Bad points:
recover()
to select an earlier entry pointAlternatively use options(error = recover)
, run code to debug, then set options(error = NULL)
.
Open debug_practice.R
and source the function f()
.
Try to run f(10)
- there’s an error! Use traceback()
to see which function call generated the error, then fix the problem.
Run f(10)
again - there is another error! Can you fix this directly given the error message?
Try running f(1)
- is the result what you expected? Use debugonce()
to set debugging on f()
and re-run f(1)
. Step through the function, printing each object as it is created to see what is happening.
Can you think how to improve the function? See if you can modify the function to give a sensible result for any integer.
Material (with minor adaptations) from Heather Turner:
https://hturner.github.io/IISA2022/01_developing_r_functions.html
(with permission)
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
Comments help to record what a function does
The docstring package enables roxygen comments to be turned into a help file
For fuller documentation, see the docstring vignette.
ADD LINK TO VIGNETTE
ADD A SLIDE WITH SOME RESOURCES ABOUT ROXYGEN
When developing a function, we will want to validate its output.
A simple approach is to try different inputs
Doing this each time we change the function becomes tedious to check and error-prone as we miss important tests.
The testthat packages allows us to create a test suite:
ADD LINK
If we save the tests in a file, e.g. tests.R
, we can use test_file()
to run and check all tests:
√ | OK F W S | Context
x | 2 1 | log_2 works correctly
--------------------------------------------------------------------------------
tests.R:9: failure: negative values give error
`log_2(2^-1)` did not throw an error.
--------------------------------------------------------------------------------
== Results =====================================================================
OK: 2
Failed: 1
Warnings: 0
Skipped: 0
Copy the qq
function to a new R script and save as functions.R
. Add roxygen comments at the start of the function body to define a title and parameter documentation.
Run the documentation
chunk of exercises.Rmd
to view your documentation.
Open the tests.R
script. Using expect_equal
add some tests for the following
Use the tol
argument in expect_equal
to set a tolerance of 0.01.
Run the tests
chunk of exercises.Rmd
to run your tests with test_file
. Try changing the expected tolerance to get a test to fail.