Advanced R
June 20, 2023
To make our code more efficient, we first need to identify the bottlenecks, in terms of time and/or memory usage.
Profiling stops the execution of code every few milliseconds and records
We will use the profvis package to visualise profiling results.
The following code is saved in profiling-example.R
and uses profvis::pause()
to wait 0.1s inside each function
Source the code to be profiled and pass the function call to be profiled to profvis()
An interactive HTML document will open with the results.
In RStudio this will open in the source pane; click “show in new window” button to open the document in a new window.
In the flame graph the yellow bars correspond to lines in the source file shown above the graph. The plot is interactive.
In the overall time of 250ms we see:
cmp
function is called as R tries to compile new functions so that it can call the compiled version in subsequent calls.No objects are created or deleted: no memory changes.
The Data tab shows a table with the memory and time usage for each function call. The nested calls can be expanded/collapsed to show/hide the corresponding lines.
To illustrate memory profiling we can consider a loop that concatenates values.
As it is a small code snippet, we can pass to profvis()
directly
<GC>
As expected, the majority of the time is spent within c()
, but we also see a lot time spent in <GC>
, the garbage collector.
In the memory column next to the corresponding line in the source code, we see a bar to left labelled -123.0 and a bar to the right labelled 137.2. This means that 137 MB of memory was allocated and 123 MB of memory was released.
Each call to c()
causes a new copy of x
to be created.
Memory profiling can help to identify short-lived objects that might be avoided by changes to the code.
In the game of Monopoly, players roll two die to move round the board. Players buy assets on which they can charge rent or taxes and aim to make the most money.
The squares on the board represent
The efficient package contains the simulate_monopoly()
function to simulate game play; we’ll use this to practice profiling.
Use profvis()
to profile simulate_monopoly(10000)
. Explore the output. Which parts of the code are slow?
Most of the time is spent in the function move_square()
. Use View(move_square)
to view the source code. Copy the code to a new .R file
and rename the function move_square2
. Edit move_square2()
to speed up the slow parts of the code. (Go to next slide for testing the updates)
Create a wrapper to run a specified move square function n
times with different seeds:
Run bench::mark(run(n, move_square), run(n, move_square2))
with n = 1000
to test your changes.
Finally, compare profvis(run(n, move_square))
with profvis(run(n, move_square2))
.
In the next session, we’ll cover using C++ via Rcpp to rewrite R code that has been identified through profiling as causing bottlenecks in your code.
Wickham, H, Advanced R (2nd edn), Improving performance section
Gillespie, C and Lovelace, R, Efficient R programming
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).