Simulation and Optimisation

R Foundations Course

Ella Kaye | Department of Statistics | University of Warwick

November 21, 2023

Overview

  • Numerical precision

  • Random numbers

  • Simulation

  • Optimisation

This material is largely reproduced from Ruth Ripley’s 2013 APTS course

Numerical precision

How R stores numbers

Some basic understanding is needed of how R stores and does arithmetic on numbers to avoid being surprised by the results it gives.

From the R FAQ:

The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then.

Example

a <- sqrt(2)
a * a == 2
[1] FALSE
a * a - 2
[1] 4.440892e-16
all.equal(a * a, 2)
[1] TRUE

The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps^0.5.1 If you want much greater accuracy than this you will need to consider error propagation carefully.

Numerical consideration resources

For more information, see e.g. 

Simulation

Uses of simulation

  • Statistical models are often mathematically intractable

  • Generate multiple samples for a model by simulation

  • Use these samples to investigate the behaviour of the model

  • Need realisations of random variables with various different distributions, our random numbers

  • Details of how to use the random numbers to create samples will not be considered here

Random number generation

  • Random numbers calculated on a computer are not random.

  • They are pseudo-random, following a predicted sequence, but in the short-term (i.e. anything but the very long-term) will appear to be random.

  • This is useful, as two sets of random numbers of the same size from the same generator using the same initial value will be exactly the same. This is crucial for reproducibility.

  • R provides functions for generating random samples from standard distibutions.

Random seeds in R

  • Each time a random number is required, R will use and update a variable called .Random.seed which is in your workspace.
  • At the first use, if .Random.seed does not exist, one will be created, with a value generated from the time.
  • The function set.seed(n) will set .Random.seed to a value derived from the argument n.
  • Use set.seed() to repeat the same sequence of random numbers.

Random number generators in R

  • There are various different generators available. R will use the one in use at the start of your session unless you alter it, even if you delete .Random.seed. For safety, you can specify the kind on calls to set.seed. Use set.seed(n, kind="default") to ensure you are using R’s default.
  • It is possible to save and restore .Random.seed within your functions, but take care with scope.
  • Note that random number generation changed in R version 4.0.0, so you might get different outputs even with the same seed when using versions before and after that release.

set.seed() example

sample(10)
 [1]  6  1  9  7  4  8  2  3 10  5
set.seed(1)
sample(10)
 [1]  9  4  7  1  2  5  3 10  6  8
sample(10)
 [1]  3  1  5  8  2  6 10  9  4  7
set.seed(1)
sample(10)
 [1]  9  4  7  1  2  5  3 10  6  8

sample()

The sample() function re-samples from a data vector, with or without replacement.

sample(x, size, replace = FALSE, prob = NULL)

x positive integer or a vector
size non-negative integer, number of items to choose
replace logical - should sampling be with replacement
prob a vector of probability weights for obtaining the elements of the vector being sampled.

sample() examples

A random permutation of 1, …, n

set.seed(123)
sample(10)
 [1]  3 10  2  8  6  9  1  7  5  4

See also sample.int():

set.seed(123)
sample.int(10)
 [1]  3 10  2  8  6  9  1  7  5  4

A random permutation of x for length(x) > 1

set.seed(10)
alph5 <- LETTERS[1:5]
sample(alph5)
[1] "C" "A" "B" "E" "D"

A bootstrap sample

set.seed(10)
sample(alph5, replace = TRUE)
[1] "C" "A" "B" "D" "C"

Sample n items from x without replacement

set.seed(10)
sample(alph5, 3)
[1] "C" "A" "B"

Sample n items from x with replacement

set.seed(1)
sample(alph5, 3, replace = TRUE)
[1] "A" "D" "A"

Probability sample of n items from x

x <- 1:4
probs <- c(1/2, 1/3, 1/12, 1/12)

set.seed(1)
samp <- sample(x, 100, replace = TRUE, prob = probs)
table(samp)
samp
 1  2  3  4 
52 34 10  4 

Distributions in R

Many statistical distributions are built into R. Each has a root name, e.g. norm for the normal distribution.

There are four functions for each distribution, with different letters prefixing the root name:

  • p for probability, the cumulative distribution function
  • q for quantile, the inverse c.d.f.
  • d for density, the density function, p.d.f.
  • r for random, a random sample from the distribution.

So, for the normal distribution, we have the functions pnorm, qnorm, dnorm, rnorm.

Distributions and parameterisations

Distribution Base name Parameters
beta beta shape1, shape2
binomial binom size, prob
Cauchy cauchy location, scale
chi-squared chisq df
exponential exp rate
F f df1, df2
gamma gamma shape, rate
geometric geom p
hypergeometric hyper m, n, k
log-normal lnorm meanlog, sdlog
logistic logis location, scale
negative binomial nbinom size, prob
normal norm mean, sd
Poisson pois lambda
Student t t df
uniform unif min, max
Weibull weibull shape, scale

Generating random samples from a distibution

To obtain a sample of size n, use the r function for the distribution with the first argument n, and subsequent arguments the parameters for that distribution. The parameters often have default values. E.g.

runif(n, min=0, max=1) Uniform
rnorm(n, mean=0, sd=1) Normal
rexp(n, rate=1) Exponential with mean 1/rate
rt(n, df) t with df degrees of freedom
rbinom(n, size, prob) Binomial, success in size trials with probability of success prob

Table from https://www.johndcook.com/blog/distributions_r_splus/

Your turn!

Sometimes sample() can lead to nasty surprises.

Consider the following code. Which line(s) might cause a problem?

x <- 1:10
sample(x[x > 7])
sample(x[x > 8])
sample(x[x > 9])
sample(x[x > 10])

Run the code to see what happens.

Warning

You need to be careful when using sample() programmatically (i.e. in your function or simulation).

See ?sample for a safer version and check you understand how the proposed resample() function in the examples works.

Optimisation

The optimisation problem

  • Given a function \(f(x)\), what value of \(x\) makes \(f(x)\) as small or large as possible?
  • In a statistical context, \(x\) will usually be the parameters of a model, and \(f(x)\) either the model likelihood to be maximised or some measure of discrepancy between data and predictions to be minimised
  • The optimal set of parameters will give the best fit
  • Only need to consider small as \(-f(x)\) is large when \(f(x)\) is small.
  • We consider here general-purpose optimisers

Local and global minima

  • The (negative of the) likelihood for the General Linear Model (and that for many other linear models) is well-behaved: it has a single, global minimum.
  • For more complicated models there may be many local minima.
  • Finding a global minimum is difficult, and not always important. Only if local minima are widely separated in parameter space are they likely to invalidate our conclusions.
  • We will concentrate on methods of finding local minima. Check for different local minima by altering the initial values, algorithm used, or other parameters of the fitting process.

Univariate optimisation

optimize (or optimise) finds a (possibly local) mimimum of a function in a specified interval with respect to its first argument.

  • Function to be minimised is the first argument to optimize
  • Can pre-specify the function or include it in the command.

Univariate optimisation: example

f <- function(x, a) {
  (x - a)^2
}

xmin <- optimize(f, interval = c(0, 1), a = 1/3)

# or

xmin <- optimize(function(x, a) {(x - a)^2}, 
                 interval = c(0, 1), a = 1/3)

xmin
$minimum
[1] 0.3333333

$objective
[1] 0

Note how the (fixed) parameter a is passed into f.

Other optimize() arguments

  • An interval within which to search must be specified, either by interval or by upper and lower

  • To maximise set maximum = TRUE

  • Accuracy can be set using the tol argument

  • Note the order of arguments: optimize(f, interval, ..., lower, upper, maximum, tol)

  • The ... can be named or unnamed and will be passed to f

  • Arguments after the ... must be specified by names.

  • optimize returns a list with two items:

    • minimum: the value of x at which f(x) is minimised
    • objective: the value of f(x) at x = minimum

General optimisation

  • In more than one dimension the problem is harder.
  • R has several different functions: most flexible is optim() which includes several different algorithms.
  • Algorithm of choice depends on how easy it is to calculate derivatives for the function. Usually better to supply a function to calculate derivatives, but may be unnecessary extra work.
  • Ensure the problem is scaled so that unit change in any parameter gives approximately unit change in objective.

optim() methods

  • The default method
  • Basic idea: for a function with n parameters, choose a polygon with n+1 vertices. At each step, alter vertex with minimum f(x) to improve the objective function, by reflection, expansion or contraction
  • Does not use derivative information
  • Useful for non-differentiable functions
  • May be rather slow
  • A quasi-Newton method: builds up approximation to Hessian matrix from gradients at start and finish of steps
  • Uses the approximation to choose new search direction
  • Performs line search in this direction
  • Update term for the Hessian approximation is due to Broyden, Fletcher, Goldfarb and Shanno (proposed separately by all four in 1970)
  • Uses derivative information, calculated either from a user-supplied function or by finite differences
  • If dimension is large, the matrix stored may be very large
  • A conjugate gradient method: chooses successive search directions that are analogous to axes of an ellipse
  • Does not store a Hessian matrix
  • Three different formulae for the search directions are implemented: Fletcher-Reeves, Polak-Ribiere or Beale-Sorenson
  • Less robust than BFGS method
  • Uses derivative information, calculated either from a user-supplied function or by finite differences
  • A limited memory version of BFGS
  • Does not store a Hessian matrix, only a limited number of update steps for it
  • Uses derivative information, calculated either from a user-supplied function or by finite differences
  • Can restrict the solution to lie within a “box”, the only method of optim() that can do this
  • A variant of simulated annealing A stochastic algorithm
  • Accepts changes which increase the objective with positive probability (when minimising!)
  • Does not use derivative information
  • Can be very slow to converge but may find a ‘good’ solution quickly
  • An interface to optimize()
  • Only for use with one dimensional problems
  • Included for use inside other functions where only method can be specified, not the function to be used.

How to use optim()

optim(par, fn, gr=NULL, ..., method=c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower=-Inf, upper=Inf, control=list(), hessian=FALSE)

par starting values of the parameters
fn function to be optimised (supply as for optimize)
gr function to calculate the derivative, only relevant for methods “BFGS”, “CG” or “L-BFGS-B”
other parameters for (both) fn and gr
method algorithm to use
lower, upper vector of limits for parameters (only allowed it method="L-BFGS-B"
control control options (see next slide)
hessian logical: whether to return a hessian estimate calculated by finite differences

optim(): control options

There are very many. The most important are:

trace A positive integer: higher values give more information
fnscale An overall scaling: if negative, maximisation will be performed
parscale A vector of scalings for the parameters
maxit Maximum number of iterations to be performed. May be used to terminate unsuccessful attempts. Also used to perform one or two steps if convergence is unimportant
type Used to select formula for “CG” method

optim(): components of return value

par best set of parameters
value value of fn corresponding to par
counts number of calls to fn and gr: excludes calls for purposes of approximating derivatives or Hessian
convergence error code: 0=success, 1=maxit reached, 10=degeneracy of Nelder-Mead simplex, 51=warning from “L-BFGS-B”, 52=error from “L-BFGS-B”
message further information, if any
hessian if requested, a symmetric matrix estimate of the Hessian at the solution

Your turn! Q1

fw <- function (x) {
    10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
}
  • Use optim() to find an approximate global minimum

  • Use optim() again to improve locally (aiming for an objective of 67.46773).

Your turn! Q2

fn <- function(x) {   
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

gr <- function(x) { 
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
}
  • Using a start value of c(0, 0), experiment with the optim() function to see which methods converge

End matter

Resources

This material is reproduced in large part directly from the APTS 2013/14 resources by Ruth Ripley:

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).