Classes and Methods

R Programming

Ella Kaye and Heather Turner
Department of Statistics, University of Warwick

March 18, 2024

Overview

  • Object-oriented programming
  • S3
  • Other OOP systems (S4, R6, S7)

Source material

This material is largely based on Chapters 12 and 13 of Advanced R, 2nd edition, by Hadley Wickham.

The book is freely available online: https://adv-r.hadley.nz.

It is shared under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Object-oriented programming

Object-oriented programming (OOP)

With OOP, a developer can consider a function’s interface separately from its implementation.

This makes it possible to use the same function for different types of input.

These are called generic functions.

OOP definitions

  • OO systems call the type of an object its class.

  • An implementation for a specific class is called a method.

  • The class defines the fields, the data possessed by every instance of that class.

Tip

Roughly speaking, a class defines what an object is and methods define what an object can do.

OOP definitions (continued)

  • Classes are organised in a hierarchy, so that if a method does not exist for one class, its parent’s method is used.

  • The child is said to inherit behaviour.

  • The process of finding the correct method given a class is called method dispatch.

Generic functions

Generic functions provide a unified interface to methods for objects of a particular class, e.g.

library(palmerpenguins)
summary(penguins$species)
   Adelie Chinstrap    Gentoo 
      152        68       124 
summary(penguins$flipper_length_mm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    172     190     197     201     213     231       2 

Here, we use the same function, summary(), on objects of classes factor and integer and get different output for each.

Motivation for OOP

summary() could contain several if-else statements, but

  • the code would become hard to follow
  • only the function authors (R Core) could add new implementations

What does OOP offer?

  • separates the code for different data types
  • avoids duplicating code by method inheritance from parent class(es) to child class (subclass)
  • makes it possible for external developers to add methods for new types of object
    • this can be particularly useful when writing R packages

OOP Systems in R

There are 3 main OOP systems in use:

  • S3
    • Used in base R and most recommended/CRAN packages
    • Use unless you have good reason not to
  • S4
    • Used on Bioconductor
    • Allow more complex relationships between classes and methods
  • R6
    • More similar to OOP in other languages
    • May prefer if S3 insufficient and not aiming for Bioconductor

A new OOP system, S7, is in development as a successor to S3 and S4.

sloop

The sloop package provides tools to help you interactively explore and understand object oriented programming in R, particularly with S3.

library(sloop)

Objects

In R, we can distinguish between base objects and OO objects.

A base object:

is.object(1:10)
[1] FALSE
sloop::otype(1:10)
[1] "base"

An OO object:

is.object(penguins)
[1] TRUE
sloop::otype(penguins)
[1] "S3"

Classes

Techincally, the difference between base and OO objects is that OO objects have a class attribute:

attr(1:10, "class")
NULL
attr(penguins, "class")
[1] "tbl_df"     "tbl"        "data.frame"
sloop::s3_class(penguins)
[1] "tbl_df"     "tbl"        "data.frame"

Base types

Only OO objects have a class attribute, but every object has a base type.

There are 25 different base types, e.g.

typeof(NULL)
[1] "NULL"
typeof(1)
[1] "double"
typeof(1L)
[1] "integer"
typeof("hello")
[1] "character"

“object of type ‘closure’ is not subsettable”

typeof(mean)
[1] "closure"
mean[1]
Error in mean[1]: object of type 'closure' is not subsettable

S3

S3 objects

An S3 object has a "class" attribute:

attr(penguins$species, "class")
[1] "factor"
unique(penguins$species)
[1] Adelie    Gentoo    Chinstrap
Levels: Adelie Chinstrap Gentoo

S3 objects: the underlying object

With unclass() we obtain the underlying object, its base type, here an integer vector

species_no_class <- unclass(penguins$species)
class(species_no_class)
[1] "integer"
unique(species_no_class)
[1] 1 3 2
attributes(species_no_class)
$levels
[1] "Adelie"    "Chinstrap" "Gentoo"   

OO type vs base type when passed to generic

f <- factor(c("a", "b", "c"))
print(f)
[1] a b c
Levels: a b c
print(unclass(f))
[1] 1 2 3
attr(,"levels")
[1] "a" "b" "c"

generic as middleman

The generic is the middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a method, and the generic finds that method by performing method dispatch.

Hadley Wickham, Advanced R (2e)

Naming scheme

S3 methods are functions with a special naming scheme, generic.class(). For example, the factor method for the print() generic is called print.factor().

You should never call the method directly, but instead rely on the generic to find it for you.

Tip

This is why it is not considered best practice to use . when naming your own functions.

Warning

Lots of important R functions that are not methods do have . in the title – these are from before S3.

Creating OO objects

To make an object an instance of a class, you simply set the class attribute.

(S3 has no formal definition of a class).

Creating an S3 object with stucture()

You can use structure() to define an S3 object with a class attribute:

dp <- 2
structure(list(pi = trunc(10^dp * pi)/10^dp, dp = dp),
          class = "pi_trunc")
$pi
[1] 3.14

$dp
[1] 2

attr(,"class")
[1] "pi_trunc"

Potentially further attributes can be added at the same time, but typically we would use a list to return all the required values.

Creating an S3 object with class()

Alternatively, we can add a class attribute using the class() helper function:

pi2dp <- list(pi = trunc(10^dp * pi)/10^dp, dp = dp)
class(pi2dp) <- "pi_trunc"
pi2dp
$pi
[1] 3.14

$dp
[1] 2

attr(,"class")
[1] "pi_trunc"

Warning!

S3 has no checks for correctness, so we can change the class of objects.

This is a bad idea!

mod <- lm(flipper_length_mm ~ bill_length_mm, data = penguins)
class(mod)
[1] "lm"
class(mod) <- "Date"
print(mod)
Error in as.POSIXlt(.Internal(Date2POSIXlt(x, tz)), tz = tz): 'list' object cannot be coerced to type 'double'

R doesn’t stop you from shooting yourself in the foot, but as long as you don’t aim the gun at your toes and pull the trigger, you won’t have a problem.

Creating your own classes

All objects of the same class should have the same structure, i.e. same base type and same attributes.

Recommend that you create:

  • a low-level constructor, new_myclass(), that efficiently creates objects with the correct structure
  • A validator, validate_myclass() that performs more computationally expensive checks to ensure the object has correct values
  • A user-friendly helper, myclass(), that provides a convenient way for others to create objects of your class.

See https://adv-r.hadley.nz/s3.html#s3-classes for more details.

S3 generic functions

S3 generic functions are simple wrappers to UseMethod()

print
function (x, ...) 
UseMethod("print")
<bytecode: 0x123f09ea8>
<environment: namespace:base>

useMethod()

The UseMethod() function takes care of method dispatch: selecting the S3 method according to the class of the object passed as the first argument.

class(penguins$species[1:3])
[1] "factor"
print(penguins$species[1:3])
[1] Adelie Adelie Adelie
Levels: Adelie Chinstrap Gentoo

Here print() dispatches to the method print.factor().

s3_dispatch()

UseMethod() creates a vector of method names then looks for each potential method in turn. We can see this with sloop::s3_dispatch():

s3_dispatch(print(penguins$species))
=> print.factor
 * print.default
  • => indicates the method that is called here.
  • * indicated a method that is defined, but not called.

default

default is a special pseudo-class that provides a fallback whenever a class-specific method is not available.

s3_dispatch(print(pi2dp))
   print.pi_trunc
=> print.default

print.pi_trunc is not defined.

Method dispatch

An S3 object can have more than one class e.g.

class(penguins)
[1] "tbl_df"     "tbl"        "data.frame"

UseMethod() works along the vector of classes (from the first class to the last), looks for a method for each class and dispatches to the first method it finds.

If no methods are defined for any of class, the default is used , e.g. print.default().

If there is no default, an error is thrown.

S3 methods for a class

See the methods for a given S3 class:

# nls is nonlinear least squares
methods(class = "nls")
 [1] anova       coef        confint     deviance    df.residual fitted     
 [7] formula     logLik      nobs        predict     print       profile    
[13] residuals   summary     vcov        weights    
see '?methods' for accessing help and source code
s3_methods_class("nls") |> head()
# A tibble: 6 × 4
  generic     class visible source             
  <chr>       <chr> <lgl>   <chr>              
1 anova       nls   FALSE   registered S3method
2 coef        nls   FALSE   registered S3method
3 confint     nls   FALSE   registered S3method
4 deviance    nls   FALSE   registered S3method
5 df.residual nls   FALSE   registered S3method
6 fitted      nls   FALSE   registered S3method

S3 methods for a generic

See the methods for a given generic function:

methods("coef")
[1] coef.aov*     coef.Arima*   coef.default* coef.listof*  coef.maov*   
[6] coef.nls*    
see '?methods' for accessing help and source code

Asterisked methods are not exported.

s3_methods_generic("coef")
# A tibble: 6 × 4
  generic class   visible source             
  <chr>   <chr>   <lgl>   <chr>              
1 coef    aov     FALSE   registered S3method
2 coef    Arima   FALSE   registered S3method
3 coef    default FALSE   registered S3method
4 coef    listof  FALSE   registered S3method
5 coef    maov    FALSE   registered S3method
6 coef    nls     FALSE   registered S3method

View S3 methods

S3 methods need not be in the same package as the generic.

Find an unexported method with getS3method() or sloop::s3_get_method()

getS3method("coef", "default")
function (object, complete = TRUE, ...) 
{
    cf <- object$coefficients
    if (complete) 
        cf
    else cf[!is.na(cf)]
}
<bytecode: 0x11158e8d8>
<environment: namespace:stats>
s3_get_method("coef.default") # equivalent

Writing S3 Methods

The arguments of a new method should be a superset of the arguments of the generic

args(print)
function (x, ...) 
NULL

New methods have the name format generic.class:

print.pi_trunc <- function(x, abbreviate = TRUE, ...){
  dp_text <- ifelse(abbreviate, "d.p.", "decimal places")
  cat("pi: ", x$pi, " (", x$dp, " ", dp_text, ")", sep = "")
}
print(pi2dp)
pi: 3.14 (2 d.p.)
print(pi2dp, abbreviate = FALSE)
pi: 3.14 (2 decimal places)

Inheritance

S3 classes can share behaviour through a mechanism called inheritance. Inheritance is powered by three ideas.

  • The class can be a character vector

  • If a method is not found for the class in the first element of the vector, R looks for a method in the second class (and so on)

  • A method can delegate work by calling NextMethod().

Multiple classes

The class of an S3 object can be a vector of classes:

fit <- glm(y ~ x, data = data.frame(y = 1:3, x = 4:6))
class(fit)
[1] "glm" "lm" 

We say fit is a "glm" object that inherits from class "lm".

  • glm is a subclass of lm, because it always appears before it in the class vector.

  • lm is a superclass of glm.

inherits()

The inherits() function can be used to test if an object inherits from a given class:

inherits(fit, "glm")
[1] TRUE
inherits(fit, "lm")
[1] TRUE
inherits(fit, "xlm")
[1] FALSE

Your turn (part 1)

  1. Create a function to fit an ordinary least squares model given a response y and an explanatory variable x, that returns an object of a new class "ols", that inherits from "lm".

  2. Define a print method for your function that it works as follows:

set.seed(1)
res <- ols(x = 1:3, y = rnorm(3))
res
Intercept:  -0.217 
Slope:  -0.1046 

Note: I have set options(digits = 4) to limit the number of digits printed by default throughout this presentation (default is 7).

NextMethod()

Hard to understand, so here’s a concrete example for the common use case: [.

new_secret <- function(x = double()) {
  stopifnot(is.double(x))
  structure(x, class = "secret")
}

print.secret <- function(x, ...) {
  print(strrep("x", nchar(x)))
  invisible(x)
}

x <- new_secret(c(15, 1, 456))
x
[1] "xx"  "x"   "xxx"

But there’s a problem

x[1]
[1] 15

We want this to be secret! . . .

The default [ method doesn’t preserve the class.

s3_dispatch(x[1])
   [.secret
   [.default
=> [ (internal)

A first attempt at a solution

So, need to defined a [.secret method.

But the following doesn’t work:

`[.secret` <- function(x, i) {
  new_secret(x[i])
}
x[1]

It gets stuck in infinite loop.

What’s the fix?

We need some way of calling the underlying [ code, i.e. the implementation that would get called if we didn’t have a [.secret method.

`[.secret` <- function(x, i) {
  new_secret(NextMethod())
}
x[1]
[1] "xx"

i.e. we’re defining [.secret but we still want to access the internal [ method (so we don’t get stuck in a loop) as if [.secret wasn’t defined.

Delegation with NextMethod()

s3_dispatch(x[1])
=> [.secret
   [.default
-> [ (internal)

The => indicates that [.secret is called, but that NextMethod() delegates work to the underlying internal [ method, as shown by ->.

Another NextMethod() example

data <- data.frame(x = 1:3, y = 4:6)
class(data)
[1] "data.frame"
data
  x y
1 1 4
2 2 5
3 3 6
t(data)
  [,1] [,2] [,3]
x    1    2    3
y    4    5    6

Underlying code

t.data.frame
function (x) 
{
    x <- as.matrix(x)
    NextMethod("t")
}
<bytecode: 0x1172a2270>
<environment: namespace:base>
s3_dispatch(t(data))
=> t.data.frame
-> t.default

We can explicitly call the next method that would be called by UseMethod() to reuse code whilst customising as required.

Implicit classes

As we’ve seen, is.object() or sloop::otype() can be used to find out if an object has a class (S3/S4/R6)

is.object(factor(1:3))
[1] TRUE
is.object(1:3)
[1] FALSE

An object that does not have an explicit class has an implicit class that will be used for S3 method dispatch.

Implicit classes and dispatch

The implicit class can be found with .class2(), or sloop::s3_class()

M <- matrix(1:12, nrow = 4)
attr(M, "class")
NULL
.class2(M)
[1] "matrix"  "array"   "integer" "numeric"
s3_class(M)
[1] "matrix"  "integer" "numeric"

The class() of an object does not uniquely determine its dispatch:

s3_dispatch(print(M))
   print.matrix
   print.integer
   print.numeric
=> print.default

Attributes

We can take advantage of existing S3 methods by returning an object of a existing S3 class or an implicit class, using attributes to add custom information

x <- matrix(c(1:5, 2*(1:5)), ncol = 2)
center_x <- scale(x, scale = FALSE)
class(center_x)
[1] "matrix" "array" 
summary(center_x)
       V1           V2    
 Min.   :-2   Min.   :-4  
 1st Qu.:-1   1st Qu.:-2  
 Median : 0   Median : 0  
 Mean   : 0   Mean   : 0  
 3rd Qu.: 1   3rd Qu.: 2  
 Max.   : 2   Max.   : 4  
attr(center_x, "scaled:center")
[1] 3 6

This can avoid the need to define new classes and methods, in simple cases.

Under the hood

s3_dispatch(scale(x, scale = FALSE))
   scale.matrix
   scale.double
   scale.numeric
=> scale.default
s3_dispatch(summary(center_x))
=> summary.matrix
   summary.double
   summary.numeric
 * summary.default
View(scale.default)

In scale.default() the attribute "scaled:center" is added to the x argument, so essentially, center_x is a matrix with extra information (in this case, the colMeans of the original columns).

Your turn (part 2)

  1. Write a summary method for your ols class that uses NextMethod() to compute the usual lm summary, but return an object of class "summary.ols".

  2. Write a print method for the "summary.ols" which works as follows:

summary(res)
Coefficients: 
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2170     1.1408 -0.1902   0.8804
x            -0.1046     0.5281 -0.1980   0.8755

Residual standard error:  0.7468 
Multiple R-squared:  0.03774 

Other OOP systems

S4

S4 provides a formal approach to OOP. Its implementation is much stricter than S3.

S4 has slots, a named component of the object accessed with @.

S4 methods:

  • use specialised functions for creating classes, generics and methods
  • allow multiple inheritance: a class can have multiple parents
  • allow multiple dispatch: method selection based on the class of multiple objects

S4 uses

  • S4 is the OOP system used for bioconductor packages
  • The Matrix package

R6

  • The R6 OOP system is defined in the R6 package: https://r6.r-lib.org
  • Encapsulated OOP, similar to OOP systems in other languages
  • The Advanced R book cautions against using R6 - it leads to non-idiomatic R code.

S7

End matter

References

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).