C++ and Rcpp

Advanced R

Heather Turner and Ella Kaye

Department of Statistics, University of Warwick

June 20, 2023

Overview

When to use C++
Getting set up
C++ basics and Rcpp
Rcpp sugar

When to use C++

Limits of R

Sometimes you reach the limits of R:

Your code is still slow despite optimizing the computational approach and the R implementation
You could speed up the R code, but it results in very obscure, convoluted code

In this case it can make sense to code parts in C++.

Typical scenarios

There are some typical scenarios where C++ is likely to be a good idea

Loops that can’t be vectorized because iterations depend on previous results
Recursive functions, or problems which involve calling functions millions of times.
Problems that require advanced data structures and algorithms that R doesn’t provide.

Getting set up

Set up to use C++

To use C++, you need a working C++ compiler.

On MacOS/Windows there is some setup to do, but it will also set you up to

Develop packages in R
Install packages from GitHub that include C/C++ code

On Linux, you will usually have a C++ compiler installed, but you might as well get set up to develop R packages too.

Linux

Debian/Ubuntu:

apt-get update
apt-get install r-base-dev

Fedora/RedHat: should be set up already.

MacOS

Option 1

Option 2

Install the current release of full Xcode from the Mac App Store
Within XCode go to Preferences -> Downloads and install the Command Line Tools
More convenient but installs a lot you don’t need

Windows

Download the Rtools installer that matches your version of R from https://cran.r-project.org/bin/windows/Rtools/
Run the installer, Rtools.exe, keeping the default settings.

C++ basics and Rcpp

A first C++ function

Consider an R function add_r() to add two numbers

add_r <- function(x, y) x + y

Here’s how we might write an equivalent add_cpp() function in C++

double add_cpp(double x, double y) { 
  double value = x + y;
  return value;
}

The type for the return value is declared first
The type of each argument must be declared
The type of intermediate objects must be declared
Return statements must use return

Rcpp

To use add_cpp() in R we need to compile the C++ code and construct an R function that connects to the compiled C++ function.

The R package Rcpp takes care of these steps for us.

One way is to use the cppFunction(), e.g.

library(Rcpp)
cppFunction('
  double add_cpp(double x, double y) {
    double value = x + y;
    return value;
  }
')

Using the C++ function

After defining add_cpp() via cppFunction(), add_cpp() is available to use as a R function

add_cpp

function (x, y) 
.Call(<pointer: 0x101150b40>, x, y)

add_cpp(3, 5)

[1] 8

Stand-alone C++ files

It is better to define functions in C++ files (extension .cpp). These files will be recognised by RStudio and other IDEs, with the usual benefits.

The C++ file should have these lines at the top:

#include <Rcpp.h>
using namespace Rcpp;

The compiler will locate the Rcpp header file with functions and class definitions supplied by Rcpp and include the contents.
Adding the namespace means that we can use Rcpp functions in the C++ code without prefixing the function names by Rcpp::.

Above each function we want to use in R, add // [[Rcpp::export]]

Example C++ file

The following is in the file add_cpp2.cpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double add_cpp2(double x, double y) {
  double value = x + y;
  return value;
}

`sourceCpp()`

Now we can use sourceCpp() to make the C++ functions available in R

path <- here::here("C++", "add_cpp2.cpp")
sourceCpp(path)
add_cpp2(5, 9)

[1] 14

Benefits of separate C++ files

There are a number of benefits to writing C++ code in separate .cpp files, compared to cppFunction()

syntax highlighting
avoid mistakes switching from R to C code
line numbers for compilation errors
highlighting errors (e.g. missing “;”)

C++ Basics

Every statement within { } must be terminated by a ;.
Use = for assignment (<- is not valid).
Addition, subtraction, multiplication and division use the same operators as R (+, -, *, /).
Comparison operators are the same as R (==, !=, >, etc)
One-line comments start with //.
Multi-line comments use /* */ delimiters

/*
Example
multi-line comment
*/

Data types

The basic C++ data types are scalars. Rcpp provides vector versions

R	C++ (scalar)	Rcpp (vector)
numeric	double	NumericVector
integer	int	IntegerVector
character	char	CharacterVector
logical	bool	LogicalVector

Rcpp also provides String as an alternative to char

Example: no inputs, scalar output

int one() {
  return 1;
}

Example: if/else (scalar input, scalar output)

int signC(int x) {
  if (x > 0) {
    return 1;
  } else if (x == 0) {
    return 0;
  } else {
    return -1;
  }
}

For loop syntax

A C++ for loop has the form

for (int i = 0; i < n; ++i) {
  total += x[i];
}

Syntax: for (initialisation; condition; increment)
- Initialize integer i at zero
- Continue as long as i is less than n
- Increment i by 1 after each iteration
  - ++i is equivalent to i = i + 1
total += x[i] is equivalent to total = total + x[i]
Vector indices start at zero

Example: `for` loop (vector input, scalar output)

double sumC(NumericVector x) {
  int n = x.size();
  double total = 0;
  for (int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total;
}

Use .size() method to find the length of a vector

Example: `while` loop (vector input, scalar output)

double sumC(NumericVector x) {
  int n = x.size();
  double total = 0;
  int i = 0;
  while (i < n) {
    total += x[i];
    ++i;
  }
  return total;
}

Use break to break from a while or for loop
Use continue to skip to the next iteration (vs next in R)

Example: vector output

The following computes Euclidean distances

\[d_i = \sqrt{(x - y_i)^2}\]

NumericVector distC(double x, NumericVector y) {
  int n = y.size();
  NumericVector dist(n);
  for(int i = 0; i < n; ++i) {
    dist[i] = sqrt(pow(ys[i] - x, 2.0));
  }
  return out;
}

dist(n) is used to create a numeric vector named dist of length n.
v(n) would create a vector named v.

C++ Functions

pow is a standard C++ function for computing a value raised to a power.

Both pow and sqrt are functions from the <cmath> library, see e.g. w3schools C++ math.

To use <cmath> functions in C++ code, we would normally need to include the <cmath> header in our .cpp file. However, Rcpp defines its own version of these functions, so we can use them with just the Rcpp header.

Creating a C++ file in RStudio

From the menu bar:

Go to File > New File > C++ file
This template already incleas the headers required for Rcpp
Delete the extra content, apart from the comment //[Rcpp::export]

You can also create a C++ file from the new file drop-down in the Files pane, but this will be blank.

Your turn

Create a new C++ file (recommend using the RStudio template)
Convert the following R function that computes a weighted mean to C++

wmean_r <- function(x, w){
  n <- length(x)
  total <- total_w <- 0
  for (i in 1:n){
    total <- total + x[i] * w[i]
    total_w <- total_w + w[i]
  }  
  total/total_w
}

Use sourceCpp() to source in your function.
Use bench::mark() to compare wmean_r(), wmean_cpp() and the stats function weighted.mean().

Missing values in C++ data types

C++ data types do not handle NAs in input well

int: use a length 1 IntegerVector instead
double: NAs okay (converted to NAN)
char: use String instead
bool: NAs converted to true; use int instead

Missing values in Rcpp vectors

Rcpp vectors handle NAs in the corresponding type

Rcpp (vector)	NA
`NumericVector`	`NA_REAL`
`IntegerVector`	`NA_INTEGER`
`CharacterVector`	`NA_STRING`
`LogicalVector`	`NA_LOGICAL`

Matrices

Each vector type has a corresponding matrix equivalent: NumericMatrix, IntegerMatrix, CharacterMatrix and LogicalMatrix.

Create a matrix called named m1

NumericMatrix m1(10, 5);

Subset with (), e.g. m1(3, 2) for the value in row 3, column 2.
The first element is m1(0, 0).
Use .nrow and .ncol methods to get the number of rows and columns
Assign matrix elements as follows

m1(0, 0) = 1;

Example: row sums (matrix input, vector output)

NumericVector rowSumsC(NumericMatrix x) {
  int nrow = x.nrow(), ncol = x.ncol();
  NumericVector out(nrow);
  
  for (int i = 0; i < nrow; i++) {
    double total = 0;
    for (int j = 0; j < ncol; j++) {
      total += x(i, j);
    }
    out[i] = total;
  }
  return out;
}

Statistical distributions

As in R, Rcpp provides d/p/q/r functions for the density, distribution function, quantile function and random generation. - Functions in the Rcpp:: namespace return a vector. - Functions in the R:: namespace (also provided by the Rcpp R package) return a scalar

For example we can sample a single value from a \(N(0, 1)\) distribution by extracting the first element from a vector

rgamma(1, 3, 1 / (y * y + 4))[0];

Or use the R::rgamma() function to sample a single value directly

R::rgamma(3, 1 / (y * y + 4))

Your turn (part 1)

In a new C++ file, convert the following Gibbs sampler to C++

gibbs_r <- function(N, thin) {
  mat <- matrix(nrow = N, ncol = 2)
  x <- y <- 0

  for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3, y * y + 4)
      y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
    }
    mat[i, ] <- c(x, y)
  }
  mat
}

Your turn (part 2)

Create a wrapper function to set the seed as follows:

gibbs_with_seed <- function(expr){
  set.seed(1)
  eval(expr)
}

We can then call gibbs_r() and gibbs_c() inside this, e.g.

gibbs_with_seed(gibbs_r(N = 100, thin = 10))

Benchmark your gibbs_r() and gibbs_c() functions with N = 100 and thin = 10, using your wrapper function to set the seed.

Rcpp sugar

Rcpp provides some “syntactic sugar” to allow us to write C++ code that is more like R code.

One example is operating on rows or columns of matrices. So far we have seen how to update individual elements of a NumericMatrix. Rcpp lets us extract an update whole rows/columns, e.g.

mat(i, _) = NumericVector::create(x,y);

A whole column would be extracted with mat(_, j).

Vectorized functions

The vectorized random generation functions are another example of Rcpp sugar.

Rcpp provide many more vectorized functions, for example:

arithmetic operators (+, -, *, \)
logical operators (==, !, =<)
arithmetic functions (sqrt, pow, …)
statistical summaries (mean, median, )

In addition, Rcpp provides many R-like functions, such as which_max or rowSums, see Unofficial API documentation for a full list.

Rcpp sugar: vectorized functions

Recall our distance function from earlier:

NumericVector distC(double x, NumericVector y) {
  int n = y.size();
  NumericVector out(n);
  for(int i = 0; i < n; ++i) {
    out[i] = sqrt(pow(ys[i] - x, 2.0));
  }
  return out;
}

With Rcpp vectorization, we can simplify this to:

NumericVector dist_sugar(double x, NumericVector y) {
  return sqrt(pow(x - y, 2));
}

Example: row maximums

This example combines row-indexing and a vectorized function, max().

NumericVector row_max(NumericMatrix mat) {
  int nrow = mat.nrow();
  NumericVector max(nrow);
  for (int i = 0; i < nrow; i++)
    max[i] = max( m(i,_) );
  return max;
}

Your turn

The following R function can be used to simulate the value of \(\pi\):

approx_pi_r <- function(N) {
    x <- runif(N)
    y <- runif(N)
    d <- sqrt(x^2 + y^2)
    return(4 * sum(d < 1.0) / N)
}

Convert this to C++, taking advantage of the vectorized Rcpp functions.

End matter

References

Similar scope to this module: Gillespie, C and Lovelace, R, Efficient R programming, Rcpp section, https://csgillespie.github.io/efficientR/performance.html#rcpp
Going a bit further: Wickham, H, Advanced R (2nd edn), Rewriting R code in C++ chapter, https://adv-r.hadley.nz/rcpp.html
Not very polished, but broader coverage of Rcpp functionality: Tsuda, M.E., Rcpp for everyone, https://teuder.github.io/rcpp4everyone_en/300_Rmath.html
The Rcpp vignettes, accessed via browseVignettes("Rcpp") or CRAN
The unofficial Rcpp API documentation
Case studies (optimising by improving R code and/or using C++)
- https://robotwealth.com/optimising-the-rsims-package-for-fast-backtesting-in-r/
- https://gallery.rcpp.org/articles/bayesian-time-series-changepoint/
- The Rcpp Gallery has all sorts of examples, many illustrating advanced features of Rcpp).

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

C++ and Rcpp

Overview

When to use C++

Limits of R

Typical scenarios

Getting set up

Set up to use C++

Linux

MacOS

Windows

C++ basics and Rcpp

A first C++ function

Rcpp

Using the C++ function

Stand-alone C++ files

Example C++ file

sourceCpp()

Benefits of separate C++ files

C++ Basics

Data types

Example: no inputs, scalar output

Example: if/else (scalar input, scalar output)

For loop syntax

Example: for loop (vector input, scalar output)

Example: while loop (vector input, scalar output)

Example: vector output

C++ Functions

Creating a C++ file in RStudio

Your turn

Missing values in C++ data types

Missing values in Rcpp vectors

Matrices

Example: row sums (matrix input, vector output)

Statistical distributions

Your turn (part 1)

Your turn (part 2)

Rcpp sugar

Rcpp sugar

Vectorized functions

Rcpp sugar: vectorized functions

Example: row maximums

Your turn

End matter

References

License

`sourceCpp()`

Example: `for` loop (vector input, scalar output)

Example: `while` loop (vector input, scalar output)