Simulation and Optimisation

R Foundations 2024

Ella Kaye, Department of Statistics

November 7, 2024

Overview

Numerical precision
Random numbers
Simulation
Optimisation

This material is largely reproduced from Ruth Ripley’s 2013 APTS course

Numerical precision

How R stores numbers

Some basic understanding is needed of how R stores and does arithmetic on numbers to avoid being surprised by the results it gives.

From the R FAQ:

The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then.

Example

a <- sqrt(2)
a * a == 2

[1] FALSE

a * a - 2

[1] 4.440892e-16

all.equal(a * a, 2)

[1] TRUE

The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps^0.5.¹ If you want much greater accuracy than this you will need to consider error propagation carefully.

Numerical consideration resources

For more information, see e.g.

David Goldberg (1991), “What Every Computer Scientist Should Know About Floating-Point Arithmetic”, ACM Computing Surveys, 23/1, 5–48, also available via https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html.
Ruth Ripley’s numerical considerations notes for APTS 2013.

Simulation

Uses of simulation

Statistical models are often mathematically intractable
Generate multiple samples for a model by simulation
Use these samples to investigate the behaviour of the model
Need realisations of random variables with various different distributions, our random numbers
Details of how to use the random numbers to create samples will not be considered here

Random number generation

Random numbers calculated on a computer are not random.
They are pseudo-random, following a predicted sequence, but in the short-term (i.e. anything but the very long-term) will appear to be random.
This is useful, as two sets of random numbers of the same size from the same generator using the same initial value will be exactly the same. This is crucial for reproducibility.
R provides functions for generating random samples from standard distibutions.

Random seeds in R

Each time a random number is required, R will use and update a variable called .Random.seed which is in your workspace.
At the first use, if .Random.seed does not exist, one will be created, with a value generated from the time.
The function set.seed(n) will set .Random.seed to a value derived from the argument n.
Use set.seed() to repeat the same sequence of random numbers.

Random number generators in R

There are various different generators available. R will use the one in use at the start of your session unless you alter it, even if you delete .Random.seed. For safety, you can specify the kind on calls to set.seed. Use set.seed(n, kind="default") to ensure you are using R’s default.
It is possible to save and restore .Random.seed within your functions, but take care with scope.
Note that random number generation changed in R version 4.0.0, so you might get different outputs even with the same seed when using versions before and after that release.

`set.seed()` example

sample(10)

 [1]  8  2  6  3  4  1  9  5  7 10

set.seed(1)
sample(10)

 [1]  9  4  7  1  2  5  3 10  6  8

sample(10)

 [1]  3  1  5  8  2  6 10  9  4  7

set.seed(1)
sample(10)

 [1]  9  4  7  1  2  5  3 10  6  8

`sample()`

The sample() function re-samples from a data vector, with or without replacement.

sample(x, size, replace = FALSE, prob = NULL)

`x`	positive integer or a vector
`size`	non-negative integer, number of items to choose
`replace`	logical - should sampling be with replacement
`prob`	a vector of probability weights for obtaining the elements of the vector being sampled.

`sample()` examples

sample(n)
sample(x)
sample(x, replace = TRUE)
sample(x, size = n)
sample(x, n, replace = TRUE)
sample(x, n, replace = TRUE, prob = probs)

A random permutation of 1, …, n

set.seed(123)
sample(10)

 [1]  3 10  2  8  6  9  1  7  5  4

Distributions in R

Many statistical distributions are built into R. Each has a root name, e.g. norm for the normal distribution.

There are four functions for each distribution, with different letters prefixing the root name:

p for probability, the cumulative distribution function
q for quantile, the inverse c.d.f.
d for density, the density function, p.d.f.
r for random, a random sample from the distribution.

So, for the normal distribution, we have the functions pnorm, qnorm, dnorm, rnorm.

Distributions and parameterisations

Distribution	Base name	Parameters
beta	`beta`	`shape1`, `shape2`
binomial	`binom`	`size`, `prob`
Cauchy	`cauchy`	`location`, `scale`
chi-squared	`chisq`	`df`
exponential	`exp`	`rate`
F	`f`	`df1`, `df2`
gamma	`gamma`	`shape`, `rate`
geometric	`geom`	`p`
hypergeometric	`hyper`	`m`, `n`, `k`
log-normal	`lnorm`	`meanlog`, `sdlog`
logistic	`logis`	`location`, `scale`
negative binomial	`nbinom`	`size`, `prob`
normal	`norm`	`mean`, `sd`
Poisson	`pois`	`lambda`
Student t	`t`	`df`
uniform	`unif`	`min`, `max`
Weibull	`weibull`	`shape`, `scale`

Generating random samples from a distibution

To obtain a sample of size n, use the r function for the distribution with the first argument n, and subsequent arguments the parameters for that distribution. The parameters often have default values. E.g.

`runif(n, min=0, max=1)`	Uniform
`rnorm(n, mean=0, sd=1)`	Normal
`rexp(n, rate=1)`	Exponential with mean 1/rate
`rt(n, df)`	t with df degrees of freedom
`rbinom(n, size, prob)`	Binomial, success in `size` trials with probability of success `prob`

Table from https://www.johndcook.com/blog/distributions_r_splus/

Your turn!

Sometimes sample() can lead to nasty surprises.

Consider the following code. Which line(s) might cause a problem?

x <- 1:10
sample(x[x > 7])
sample(x[x > 8])
sample(x[x > 9])
sample(x[x > 10])

Run the code to see what happens.

Warning

You need to be careful when using sample() programmatically (i.e. in your function or simulation).

See ?sample for a safer version and check you understand how the proposed resample() function in the examples works.

Optimisation

The optimisation problem

Given a function $f(x)$, what value of $x$ makes $f(x)$ as small or large as possible?
In a statistical context, $x$ will usually be the parameters of a model, and $f(x)$ either the model likelihood to be maximised or some measure of discrepancy between data and predictions to be minimised
The optimal set of parameters will give the best fit
Only need to consider small as $-f(x)$ is large when $f(x)$ is small.
We consider here general-purpose optimisers

Local and global minima

The (negative of the) likelihood for the General Linear Model (and that for many other linear models) is well-behaved: it has a single, global minimum.
For more complicated models there may be many local minima.
Finding a global minimum is difficult, and not always important. Only if local minima are widely separated in parameter space are they likely to invalidate our conclusions.
We will concentrate on methods of finding local minima. Check for different local minima by altering the initial values, algorithm used, or other parameters of the fitting process.

Univariate optimisation

optimize (or optimise) finds a (possibly local) mimimum of a function in a specified interval with respect to its first argument.

Function to be minimised is the first argument to optimize
Can pre-specify the function or include it in the command.

Univariate optimisation: example

f <- function(x, a) {
  (x - a)^2
}

xmin <- optimize(f, interval = c(0, 1), a = 1/3)

# or

xmin <- optimize(function(x, a) {(x - a)^2}, 
                 interval = c(0, 1), a = 1/3)

xmin

$minimum
[1] 0.3333333

$objective
[1] 0

Note how the (fixed) parameter a is passed into f.

Other `optimize()` arguments

An interval within which to search must be specified, either by interval or by upper and lower
To maximise set maximum = TRUE
Accuracy can be set using the tol argument
Note the order of arguments: optimize(f, interval, ..., lower, upper, maximum, tol)
The ... can be named or unnamed and will be passed to f
Arguments after the ... must be specified by names.
optimize returns a list with two items:
- minimum: the value of x at which f(x) is minimised
- objective: the value of f(x) at x = minimum

General optimisation

In more than one dimension the problem is harder.
R has several different functions: most flexible is optim() which includes several different algorithms.
Algorithm of choice depends on how easy it is to calculate derivatives for the function. Usually better to supply a function to calculate derivatives, but may be unnecessary extra work.
Ensure the problem is scaled so that unit change in any parameter gives approximately unit change in objective.

The default method
Basic idea: for a function with n parameters, choose a polygon with n+1 vertices. At each step, alter vertex with minimum f(x) to improve the objective function, by reflection, expansion or contraction
Does not use derivative information
Useful for non-differentiable functions
May be rather slow

A quasi-Newton method: builds up approximation to Hessian matrix from gradients at start and finish of steps
Uses the approximation to choose new search direction
Performs line search in this direction
Update term for the Hessian approximation is due to Broyden, Fletcher, Goldfarb and Shanno (proposed separately by all four in 1970)
Uses derivative information, calculated either from a user-supplied function or by finite differences
If dimension is large, the matrix stored may be very large

A conjugate gradient method: chooses successive search directions that are analogous to axes of an ellipse
Does not store a Hessian matrix
Three different formulae for the search directions are implemented: Fletcher-Reeves, Polak-Ribiere or Beale-Sorenson
Less robust than BFGS method
Uses derivative information, calculated either from a user-supplied function or by finite differences

A limited memory version of BFGS
Does not store a Hessian matrix, only a limited number of update steps for it
Uses derivative information, calculated either from a user-supplied function or by finite differences
Can restrict the solution to lie within a “box”, the only method of optim() that can do this

A variant of simulated annealing A stochastic algorithm
Accepts changes which increase the objective with positive probability (when minimising!)
Does not use derivative information
Can be very slow to converge but may find a ‘good’ solution quickly

An interface to optimize()
Only for use with one dimensional problems
Included for use inside other functions where only method can be specified, not the function to be used.

How to use `optim()`

optim(par, fn, gr=NULL, ..., method=c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower=-Inf, upper=Inf, control=list(), hessian=FALSE)

`par`	starting values of the parameters
`fn`	function to be optimised (supply as for `optimize`)
`gr`	function to calculate the derivative, only relevant for methods “BFGS”, “CG” or “L-BFGS-B”
`…`	other parameters for (both) `fn` and `gr`
`method`	algorithm to use
`lower`, `upper`	vector of limits for parameters (only allowed it `method="L-BFGS-B"`
`control`	control options (see next slide)
`hessian`	logical: whether to return a hessian estimate calculated by finite differences

`optim()`: control options

There are very many. The most important are:

`trace`	A positive integer: higher values give more information
`fnscale`	An overall scaling: if negative, maximisation will be performed
`parscale`	A vector of scalings for the parameters
`maxit`	Maximum number of iterations to be performed. May be used to terminate unsuccessful attempts. Also used to perform one or two steps if convergence is unimportant
`type`	Used to select formula for “CG” method

`optim()`: components of return value

`par`	best set of parameters
`value`	value of `fn` corresponding to `par`
`counts`	number of calls to `fn` and `gr`: excludes calls for purposes of approximating derivatives or Hessian
`convergence`	error code: 0=success, 1=`maxit` reached, 10=degeneracy of Nelder-Mead simplex, 51=warning from “L-BFGS-B”, 52=error from “L-BFGS-B”
`message`	further information, if any
`hessian`	if requested, a symmetric matrix estimate of the Hessian at the solution

Your turn! Q1

fw <- function (x) {
    10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
}

Use optim() to find an approximate global minimum
Use optim() again to improve locally (aiming for an objective of 67.46773).

Your turn! Q2

Take a look at the introduction to the Wikipedia page for Rosenbrock’s banana function
Copy this code for the fn and gr

fn <- function(x) {   
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

gr <- function(x) { 
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
}

Using a start value of c(0, 0), experiment with the optim() function to see which methods converge

End matter

Resources

This material is reproduced in large part directly from the APTS 2013/14 resources by Ruth Ripley:

https://portal.stats.ox.ac.uk/userdata/ruth/APTS2013/Rcourse5.pdf

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Simulation and Optimisation

Overview

Numerical precision

How R stores numbers

Example

Numerical consideration resources

Simulation

Uses of simulation

Random number generation

Random seeds in R

Random number generators in R

set.seed() example

sample()

sample() examples

Distributions in R

Distributions and parameterisations

Generating random samples from a distibution

Your turn!

Optimisation

The optimisation problem

Local and global minima

Univariate optimisation

Univariate optimisation: example

Other optimize() arguments

General optimisation

optim() methods

How to use optim()

optim(): control options

optim(): components of return value

Your turn! Q1

Your turn! Q2

End matter

Resources

License

`set.seed()` example

`sample()`

`sample()` examples

Other `optimize()` arguments

`optim()` methods

How to use `optim()`

`optim()`: control options

`optim()`: components of return value