Advanced R
June 20, 2023
Sometimes you reach the limits of R:
In this case it can make sense to code parts in C++.
There are some typical scenarios where C++ is likely to be a good idea
To use C++, you need a working C++ compiler.
On MacOS/Windows there is some setup to do, but it will also set you up to
On Linux, you will usually have a C++ compiler installed, but you might as well get set up to develop R packages too.
Debian/Ubuntu:
Fedora/RedHat: should be set up already.
Option 1
Then, in the terminal:
Option 2
Rtools.exe
, keeping the default settings.Consider an R function add_r()
to add two numbers
Here’s how we might write an equivalent add_cpp()
function in C++
return
To use add_cpp()
in R we need to compile the C++ code and construct an R function that connects to the compiled C++ function.
The R package Rcpp takes care of these steps for us.
One way is to use the cppFunction()
, e.g.
After defining add_cpp()
via cppFunction()
, add_cpp()
is available to use as a R function
It is better to define functions in C++ files (extension .cpp
). These files will be recognised by RStudio and other IDEs, with the usual benefits.
The C++ file should have these lines at the top:
Rcpp::
.Above each function we want to use in R, add // [[Rcpp::export]]
The following is in the file add_cpp2.cpp
sourceCpp()
Now we can use sourceCpp()
to make the C++ functions available in R
There are a number of benefits to writing C++ code in separate .cpp
files, compared to cppFunction()
{
}
must be terminated by a ;
.=
for assignment (<-
is not valid).+
, -
, *
, /
).==
, !=
, >
, etc)//
./*
*/
delimitersThe basic C++ data types are scalars. Rcpp provides vector versions
R | C++ (scalar) | Rcpp (vector) |
---|---|---|
numeric | double | NumericVector |
integer | int | IntegerVector |
character | char | CharacterVector |
logical | bool | LogicalVector |
Rcpp also provides String
as an alternative to char
A C++ for
loop has the form
for (initialisation; condition; increment)
i
at zeroi
is less than n
i
by 1 after each iteration
++i
is equivalent to i = i + 1
total += x[i]
is equivalent to total = total + x[i]
for
loop (vector input, scalar output).size()
method to find the length of a vectorwhile
loop (vector input, scalar output)break
to break from a while or for loopcontinue
to skip to the next iteration (vs next
in R)The following computes Euclidean distances
\[d_i = \sqrt{(x - y_i)^2}\]
dist(n)
is used to create a numeric vector named dist
of length n
.v(n)
would create a vector named v
.pow
is a standard C++ function for computing a value raised to a power.
Both pow
and sqrt
are functions from the <cmath>
library, see e.g. w3schools C++ math.
To use <cmath>
functions in C++ code, we would normally need to include the <cmath>
header in our .cpp
file. However, Rcpp defines its own version of these functions, so we can use them with just the Rcpp header.
From the menu bar:
//[Rcpp::export]
You can also create a C++ file from the new file drop-down in the Files pane, but this will be blank.
sourceCpp()
to source in your function.bench::mark()
to compare wmean_r()
, wmean_cpp()
and the stats function weighted.mean()
.C++ data types do not handle NA
s in input well
int
: use a length 1 IntegerVector
insteaddouble
: NA
s okay (converted to NAN
)char
: use String
insteadbool
: NA
s converted to true
; use int
insteadRcpp vectors handle NA
s in the corresponding type
Rcpp (vector) | NA |
---|---|
NumericVector |
NA_REAL |
IntegerVector |
NA_INTEGER |
CharacterVector |
NA_STRING |
LogicalVector |
NA_LOGICAL |
Each vector type has a corresponding matrix equivalent: NumericMatrix
, IntegerMatrix
, CharacterMatrix
and LogicalMatrix
.
Create a matrix called named m1
()
, e.g. m1(3, 2)
for the value in row 3, column 2.m1(0, 0)
..nrow
and .ncol
methods to get the number of rows and columnsAs in R, Rcpp provides d/p/q/r functions for the density, distribution function, quantile function and random generation. - Functions in the Rcpp::
namespace return a vector. - Functions in the R::
namespace (also provided by the Rcpp R package) return a scalar
For example we can sample a single value from a \(N(0, 1)\) distribution by extracting the first element from a vector
Or use the R::rgamma()
function to sample a single value directly
In a new C++ file, convert the following Gibbs sampler to C++
Create a wrapper function to set the seed as follows:
We can then call gibbs_r()
and gibbs_c()
inside this, e.g.
Benchmark your gibbs_r()
and gibbs_c()
functions with N = 100 and thin = 10, using your wrapper function to set the seed.
Rcpp provides some “syntactic sugar” to allow us to write C++ code that is more like R code.
One example is operating on rows or columns of matrices. So far we have seen how to update individual elements of a NumericMatrix
. Rcpp lets us extract an update whole rows/columns, e.g.
A whole column would be extracted with mat(_, j)
.
The vectorized random generation functions are another example of Rcpp sugar.
Rcpp provide many more vectorized functions, for example:
+
, -
, *
, \
)==
, !
, =<
)sqrt
, pow
, …)mean
, median
, )In addition, Rcpp provides many R-like functions, such as which_max
or rowSums
, see Unofficial API documentation for a full list.
Recall our distance function from earlier:
With Rcpp vectorization, we can simplify this to:
This example combines row-indexing and a vectorized function, max()
.
The following R function can be used to simulate the value of \(\pi\):
Convert this to C++, taking advantage of the vectorized Rcpp functions.
Similar scope to this module: Gillespie, C and Lovelace, R, Efficient R programming, Rcpp section, https://csgillespie.github.io/efficientR/performance.html#rcpp
Going a bit further: Wickham, H, Advanced R (2nd edn), Rewriting R code in C++ chapter, https://adv-r.hadley.nz/rcpp.html
Not very polished, but broader coverage of Rcpp functionality: Tsuda, M.E., Rcpp for everyone, https://teuder.github.io/rcpp4everyone_en/300_Rmath.html
The Rcpp vignettes, accessed via browseVignettes("Rcpp")
or CRAN
The unofficial Rcpp API documentation
Case studies (optimising by improving R code and/or using C++)
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).