Classes and Methods

Advanced R

Heather Turner and Ella Kaye
Department of Statistics, University of Warwick

June 19, 2023

Overview

  • Object-oriented programming
  • S3
  • S4

Object-oriented programming

Object-oriented programming (OOP)

Generic functions provide a unified interface to methods for objects of a particular class, e.g.

library(palmerpenguins)
summary(penguins$species)
   Adelie Chinstrap    Gentoo 
      152        68       124 
summary(penguins$flipper_length_mm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    172     190     197     201     213     231       2 

Here, we use the same function, summary(), on objects of classes factor and integer and get different output for each.

Motivation for OOP

summary() could contain several if-else statements, but

  • the code would become hard to follow
  • only the function authors (R Core) could add new implementations

What does OOP offer?

  • separates the code for different data types
  • avoids duplicating code by method inheritance from parent class(es) to child class (subclass)
  • makes it possible for external developers to add methods for new types of object
    • this can be particularly useful when writing R packages

OOP Systems in R

There are 3 main OOP systems in use:

  • S3
    • Used in base R and most recommended/CRAN packages
    • Use unless you have good reason not to
  • S4
    • Used on Bioconductor
    • Allow more complex relationships between classes and methods
  • R6
    • More similar to OOP in other languages
    • May prefer if S3 insufficient and not aiming for Bioconductor

A new OOP system, S7, is in development as a successor to S3 and S4.

S3

S3 objects

An S3 object has a "class" attribute:

attr(penguins$species, "class")
[1] "factor"
unique(penguins$species)
[1] Adelie    Gentoo    Chinstrap
Levels: Adelie Chinstrap Gentoo

S3 objects: the underlying object

With unclass() we obtain the underlying object, here an integer vector

species_no <- unclass(penguins$species)
class(species_no)
[1] "integer"
unique(species_no)
[1] 1 3 2
attributes(species_no)
$levels
[1] "Adelie"    "Chinstrap" "Gentoo"   

Creating an S3 object with stucture()

You can use structure() to define an S3 object with a class attribute:

dp <- 2
structure(list(pi = trunc(10^dp * pi)/10^dp, dp = dp),
          class = "pi_trunc")
$pi
[1] 3.14

$dp
[1] 2

attr(,"class")
[1] "pi_trunc"

Potentially further attributes can be added at the same time, but typically we would use a list to return all the required values.

Creating an S3 object with class()

Alternatively, we can add a class attribute using the class() helper function:

pi2dp <- list(pi = trunc(10^dp * pi)/10^dp, dp = dp)
class(pi2dp) <- "pi_trunc"
pi2dp
$pi
[1] 3.14

$dp
[1] 2

attr(,"class")
[1] "pi_trunc"

S3 generic functions

S3 generic functions are simple wrappers to UseMethod()

print
function (x, ...) 
UseMethod("print")
<bytecode: 0x145b6ce68>
<environment: namespace:base>

useMethod()

The UseMethod() function takes care of method dispatch: selecting the S3 method according to the class of the object passed as the first argument.

class(penguins$species[1:3])
[1] "factor"
print(penguins$species[1:3])
[1] Adelie Adelie Adelie
Levels: Adelie Chinstrap Gentoo

Here print() dispatches to the method print.factor().

Inheritance

The class of an S3 object can be a vector of classes

fit <- glm(y ~ x, data = data.frame(y = 1:3, x = 4:6))
class(fit)
[1] "glm" "lm" 

We say fit is a "glm" object that inherits from class "lm".

The inherits() function can be used to test if an object inherits from a given class

inherits(fit, "glm")
[1] TRUE

Method dispatch

An S3 object can have more than one class e.g.

class(penguins)
[1] "tbl_df"     "tbl"        "data.frame"

UseMethod() works along the vector of classes (from the first class to the last), looks for a method for each class and dispatches to the first method it finds.

If no methods are defined for any of class, the default is used , e.g. print.default().

If there is no default, an error is thrown.

S3 Methods

See the methods for a given S3 class

# nls is nonlinear least squares
methods(class = "nls")
 [1] anova       coef        confint     deviance    df.residual fitted     
 [7] formula     logLik      nobs        predict     print       profile    
[13] residuals   summary     vcov        weights    
see '?methods' for accessing help and source code

See the methods for a given generic function

methods("coef")
[1] coef.aov*     coef.Arima*   coef.default* coef.listof*  coef.maov*   
[6] coef.nls*    
see '?methods' for accessing help and source code

Asterisked methods are not exported.

View S3 methods

S3 methods need not be in the same package as the generic.

Find an unexported method with getS3method()

getS3method("coef", "default")
function (object, complete = TRUE, ...) 
{
    cf <- object$coefficients
    if (complete) 
        cf
    else cf[!is.na(cf)]
}
<bytecode: 0x1157fcfc8>
<environment: namespace:stats>

In code, call the generic, rather than calling the method directly.

Writing S3 Methods

The arguments of a new method should be a superset of the arguments of the generic

args(print)
function (x, ...) 
NULL

New methods have the name format generic.class:

print.pi_trunc <- function(x, abbreviate = TRUE, ...){
  dp_text <- ifelse(abbreviate, "d.p.", "decimal places")
  cat("pi: ", x$pi, " (", x$dp, " ", dp_text, ")", sep = "")
}
print(pi2dp)
pi: 3.14 (2 d.p.)

NextMethod()

We can explicitly call the next method that would be called by UseMethod() to reuse code whilst customising as required

body(t.data.frame)
{
    x <- as.matrix(x)
    NextMethod("t")
}
body(rep.factor)
{
    y <- NextMethod()
    structure(y, class = class(x), levels = levels(x))
}

It is possible to call NextMethod() with arguments but it is safer to recall the generic with new arguments in this case.

Implicit classes

is.object() can be used to find out if an object has a class (S3/S4/R6)

is.object(factor(1:3))
[1] TRUE
is.object(1:3)
[1] FALSE

An object that does not have an explicit class has an implicit class that will be used for S3 method dispatch. The implicit class can be found with .class2()

M <- matrix(1:12, nrow = 4)
attr(M, "class")
NULL
.class2(M)
[1] "matrix"  "array"   "integer" "numeric"

Attributes

We can take advantage of existing S3 methods by returning an object of a existing S3 class or an implicit class, using attributes to add custom information

x <- matrix(c(1:5, 2*(1:5)), ncol = 2)
center_x <- scale(x, scale = FALSE)
class(center_x)
[1] "matrix" "array" 
summary(center_x)
       V1           V2    
 Min.   :-2   Min.   :-4  
 1st Qu.:-1   1st Qu.:-2  
 Median : 0   Median : 0  
 Mean   : 0   Mean   : 0  
 3rd Qu.: 1   3rd Qu.: 2  
 Max.   : 2   Max.   : 4  
attr(center_x, "scaled:center")
[1] 3 6

This can avoid the need to define new classes and methods, in simple cases.

Your turn (part 1)

  1. Create a function to fit an ordinary least squares model given a response y and an explanatory variable x, that returns an object of a new class "ols", that inherits from "lm".
  2. Define a print method for your function that it works as follows:
set.seed(1)
res <- ols(x = 1:3, y = rnorm(3))
res
Intercept:  -0.217 
Slope:  -0.1046 

Note: I have set options(digits = 4) to limit the number of digits printed by default throughout this presentation (default is 7).

Your turn (part 2)

  1. Write a summary method for your ols class that uses NextMethod() to compute the usual lm summary, but return an object of class "summary.ols".
  2. Write a print method for the "summary.ols" which works as follows:
summary(res)
Coefficients: 
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2170     1.1408 -0.1902   0.8804
x            -0.1046     0.5281 -0.1980   0.8755

Residual standard error:  0.7468 
Multiple R-squared:  0.03774 

S4

Overview of S4

S4 methods

  • use specialised functions for creating classes, generics and methods
  • allow multiple inheritance: a class can have multiple parents
  • allow multiple dispatch: method selection based on the class of multiple objects

The methods package provides the functions required to use S4 classes and methods, so always load this package when using S4.

S4 Classes

An S4 class can be defined with setClass(), with at least two arguments

  • A name, by convention in UpperCamelCase.
  • A named character vector specifying the names and classes of the slots (fields). Using the pseudo class ANY allows a slot to accept any type of object.
setClass("Person", 
  slots = c(
    name = "character", 
    date_of_birth = "Date",
    date_of_death = "Date",
    age_at_death = "numeric"
  )
)

Creating a new instance

A new instance of the S4 object can be created using new()

florence <- new("Person", 
                name = "Florence Nightingale",
                date_of_birth = as.Date("1820-05-12"),
                date_of_death = as.Date("1910-08-13"),
                age_at_death = 90)
str(florence)
Formal class 'Person' [package ".GlobalEnv"] with 4 slots
  ..@ name         : chr "Florence Nightingale"
  ..@ date_of_birth: Date[1:1], format: "1820-05-12"
  ..@ date_of_death: Date[1:1], format: "1910-08-13"
  ..@ age_at_death : num 90

Note that the second onwards argument names in new are the names in the vector passed to slots() when defining the class.

Examining S4 objects

Find the type of S4 class

is.object(florence)
[1] TRUE
isS4(florence)
[1] TRUE
.class2(florence)
[1] "Person"

Extract the value of a slot (use @)

getSlots("Person")
         name date_of_birth date_of_death  age_at_death 
  "character"        "Date"        "Date"     "numeric" 
florence@age_at_death # or slot(florence, "age_at_death")
[1] 90

Prototype

The prototype argument can be used to specify default values, enabling partial specification

setClass("Person", 
  slots = c(
    name = "character", 
    date_of_birth = "Date",
    date_of_death = "Date",
    age_at_death = "numeric"
  ),
  prototype = list(
    name = NA_character_, 
    date_of_birth = as.Date(NA),
    date_of_death = as.Date(NA),
    age_at_death = NA_real_
  )
)

Be sure to use list() not c() for prototype – easy mistake to make!

initialize()

An initialize() method can be used for more control over initialization

setMethod("initialize", "Person", 
  function(.Object, ...) {
    # initialize with default method 
    # (named arguments override defaults)
    .Object <- callNextMethod(.Object, ...)
    
    # compute age at death if not specified
    year <- function(x) as.numeric(format(x, "%Y"))
    m_day <- function(x) as.numeric(format(x, "%m%d"))
    
    if (is.na(.Object@age_at_death)){
      n_year <- year(.Object@date_of_death) - year(.Object@date_of_birth)
      birthday <- m_day(.Object@date_of_death) >= m_day(.Object@date_of_birth)
      .Object@age_at_death <- n_year - !birthday
    }
    .Object
  })

Creating a new S4 obect from an old one

florence <- new("Person", 
                name = "Florence Nightingale",
                date_of_birth = as.Date("1820-05-12"))
str(florence)
Formal class 'Person' [package ".GlobalEnv"] with 4 slots
  ..@ name         : chr "Florence Nightingale"
  ..@ date_of_birth: Date[1:1], format: "1820-05-12"
  ..@ date_of_death: Date[1:1], format: NA
  ..@ age_at_death : num NA
florence <- initialize(florence, 
                       date_of_death = as.Date("1910-08-13")) 
str(florence)
Formal class 'Person' [package ".GlobalEnv"] with 4 slots
  ..@ name         : chr "Florence Nightingale"
  ..@ date_of_birth: Date[1:1], format: "1820-05-12"
  ..@ date_of_death: Date[1:1], format: "1910-08-13"
  ..@ age_at_death : num 90

Inheritance

The contains argument to setClass() specifies a class or classes to inherit slots and behaviour from

setClass("BanknoteCharacter", 
  contains = "Person", 
  slots = c(
    denomination = "numeric",
    first_issue = "Date",
    last_legal = "Date"
  ),
  prototype = list(
    denomination = NA_real_,
    first_issue = as.Date(NA),
    last_legal = as.Date(NA)
  )
)

New instance of subclass

Creating a new instance of the subclass will fill in the slots of the superclass

seriesD_10GBP <- new("BanknoteCharacter", 
                     name = "Florence Nightingale", 
                     date_of_birth = as.Date("1820-05-12"), 
                     date_of_death = as.Date("1910-08-12"))
str(seriesD_10GBP)
Formal class 'BanknoteCharacter' [package ".GlobalEnv"] with 7 slots
  ..@ denomination : num NA
  ..@ first_issue  : Date[1:1], format: NA
  ..@ last_legal   : Date[1:1], format: NA
  ..@ name         : chr "Florence Nightingale"
  ..@ date_of_birth: Date[1:1], format: "1820-05-12"
  ..@ date_of_death: Date[1:1], format: "1910-08-12"
  ..@ age_at_death : num 90

Show defined S4 Class

Use showClass() to show (print) an S4 Class

showClass("Person")
Class "Person" [in ".GlobalEnv"]

Slots:
                                                              
Name:           name date_of_birth date_of_death  age_at_death
Class:     character          Date          Date       numeric

Known Subclasses: "BanknoteCharacter"

Helper function

If a user is to create these objects, define a helper function named by the class

Person <- function(name = NA, date_of_birth = NA, date_of_death = NA) {
  new("Person", 
      name = as.character(name), 
      date_of_birth = as.Date(date_of_birth), 
      date_of_death = as.Date(date_of_death))
}
ada <- Person("Ada Lovelace", "1815-12-10", "1852-11-27")
str(ada)
Formal class 'Person' [package ".GlobalEnv"] with 4 slots
  ..@ name         : chr "Ada Lovelace"
  ..@ date_of_birth: Date[1:1], format: "1815-12-10"
  ..@ date_of_death: Date[1:1], format: "1852-11-27"
  ..@ age_at_death : num 36

Validator function

Use setValidity() to check constraints beyond data type, e.g. that all slots have the same length

setValidity("Person", function(object) {
  len <- vapply(slotNames("Person"), 
                function(x) length(slot(object, x)),
                numeric(1))
  if (!all(len == len[1])){
    "slots should all be the same length"
  } else {
    TRUE
  }
})
Person(name = c("Ada Lovelace", "Grace Hopper"), 
       date_of_birth = "1815-12-10")
Error in validObject(.Object): invalid class "Person" object: slots should all be the same length

S4 generic functions

S4 generic functions are (usually) a wrapper to standardGeneric(), e.g.

getGeneric("kronecker")
standardGeneric for "kronecker" defined from package "base"

function (X, Y, FUN = "*", make.dimnames = FALSE, ...) 
standardGeneric("kronecker")
<bytecode: 0x125479598>
<environment: 0x12546c7e0>
Methods may be defined for arguments: X, Y, FUN, make.dimnames
Use  showMethods(kronecker)  for currently available ones.

By default, all arguments apart from ... are used for method dispatch.

Writing S4 generic functions

Use setGeneric to define a new generic, with the optional signature argument to specify the arguments to use for method dispatch

setGeneric("myGeneric", 
  function(x, ..., verbose = TRUE) standardGeneric("myGeneric"),
  signature = "x"
)
[1] "myGeneric"

Do not use {} in the function definition here.

S4 generics use lowerCamelCase names by convention.

S4 Methods

S4 methods for a generic function are defined with setMethod(), which takes three main arguments

  • The name of the generic function.
  • The signature specifying the classes for one or more of the arguments used for method dispatch
  • The method, defined as a function with a superset of the arguments of the generic

Writing S4 Methods

args(getGeneric("show"))
function (object) 
NULL
setMethod("show", "Person", function(object) {
  cat(object@name, "\n",
      "Born: ", format(object@date_of_birth, "%d %B %Y"), "\n",
      "Died: ", format(object@date_of_death, "%d %B %Y"), 
                " (aged ", object@age_at_death, ")\n",
      sep = "")
})
florence
Florence Nightingale
Born: 12 May 1820
Died: 13 August 1910 (aged 90)

Accessor generics

It is good practice to define generics to get and set slots that the user should have access to.

For example, a generic to get and set the date of birth

setGeneric("dob", function(x) standardGeneric("dob"))
[1] "dob"
setGeneric("dob<-", function(x, value) standardGeneric("dob<-"))
[1] "dob<-"

Accessor methods

Methods can then be defined for multiple classes using the same interface.

Access the date of birth from a Person object

setMethod("dob", "Person", function(x) x@date_of_birth)
dob(florence)
[1] "1820-05-12"

Change the date of birth

setMethod("dob<-", "Person", function(x, value) {
  x@date_of_birth <- as.Date(value)
  validObject(x)
  x
})
dob(florence) <- "1820-05-11"
dob(florence)
[1] "1820-05-11"

Method dispatch

  • One argument, single inheritance along vector of classes:
    • Same as for S3
  • One argument, multiple inheritance:
    • Follow all possible paths from child class to parent classes
    • Dispatch to first method found with the shortest path
  • Multiple dispatch
    • As multiple inheritance, but follow possible paths for multiple arguments

Keep it simple: dispatch on one or two arguments usually sufficient.

Avoid ambiguous cases by defining methods earlier in path.

Pseudo-classes

Methods can be defined for the ANY pseudo-class

  • Only selected if no method found for real classes

The MISSING pseudo-class is useful for dispatch on two arguments: allow different behaviour if only one argument specified.

Your turn

  1. Create an S4 class Diag to represent a diagonal matrix with two slots:
  • n the number of rows/cols

  • x the numeric values of the diagonal elements

    Add a prototype to specify default values.

  1. Test your class by creating a new instance first without providing any values for the slots and then providing some example values.
  2. Create an initialize method so that the n slot is computed automatically and does not have to be provided.
  3. Create a Diag() helper function to create a new Diag object, with the user only having to specify the diagonal elements.
  4. Create a show method to state the size of the matrix and print the diagonal elements.

End matter

References

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).