R Foundations Course
December 5, 2023
Plots in base R
ggplot2
Tables
Base R graphics are useful for quick, exploratory “no-frills” plots.
(For anything better looking or more complex or where you want more control, use ggplot2)
Many different objects in R have defined plot
methods:
[1] plot,ANY-method plot,color-method plot.acf*
[4] plot.data.frame* plot.decomposed.ts* plot.default
[7] plot.dendrogram* plot.density* plot.ecdf
[10] plot.factor* plot.formula* plot.function
[13] plot.ggplot* plot.gtable* plot.hcl_palettes*
[16] plot.hclust* plot.histogram* plot.HoltWinters*
[19] plot.isoreg* plot.lm* plot.medpolish*
[22] plot.mlm* plot.ppr* plot.prcomp*
[25] plot.princomp* plot.profile.nls* plot.R6*
[28] plot.raster* plot.spec* plot.stepfun
[31] plot.stl* plot.table* plot.trans*
[34] plot.ts plot.tskernel* plot.TukeyHSD*
see '?methods' for accessing help and source code
e.g. if you call plot
on an object of type lm
, it will call plot.lm
From the starting point of plot(1:10, 1:10)
, experiment with the arguments type
and pch
. See ?plot
Can you create a plot with triangular points linked by lines?
Can you do the same with the lines()
function? What are the similarities and differences?
From https://ggplot2.tidyverse.org:
R has several systems for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs. With ggplot2, you can do more faster by learning one system and applying it in many places.
From https://r4ds.had.co.nz/data-visualisation.html:
You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot2 is part of the tidyverse.
It has been around for over 10 years and is used by hundreds of thousands of people.
It can take some getting used to, but is worth the investment to learn properly
Every ggplot2 plot has three key components:
Data (typically in a data frame),
A set of aesthetic mappings between variables in the data and visual properties, and
At least one layer which describes how to render each observation. Layers are usually created with a geom_
function.
Package is ggplot2 but function is ggplot()
Layers are added with +
(not %>%
or |>
)
aes()
Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. They also provide the tools that let you interpret the plot: the axes and legends.
Three groups of scales:
ggplot(data = penguins,
aes(x = bill_length_mm,
y = bill_depth_mm,
group = species)) +
geom_point(aes(color = species,
shape = species),
size = 3,
alpha = 0.8) +
geom_smooth(method = "lm", aes(color = species)) +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
facet_wrap(~species, scales = "free_x")
ggplot(data = penguins,
aes(x = bill_length_mm,
y = bill_depth_mm,
group = species)) +
geom_point(aes(color = species,
shape = species),
size = 3,
alpha = 0.8) +
geom_smooth(method = "lm", aes(color = species)) +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin bill dimensions",
x = "bill length (mm)",
y = "bill depth (mm)") +
theme_minimal() +
theme(plot.title.position = "plot",
text = element_text(size = 20))
Recreate the base R plots from the first part of this session in ggplot2.
You may find the list of available geoms (and their help pages) useful:
Notes
aes()
can be defined for the whole plot or in the geomaes()
are x
and y
(don’t need to name them if using them in that order)See extensions at https://exts.ggplot2.tidyverse.org/gallery/
R can be used to make incredible data visualisations.
Check out the galleries of these data viz practitioners working with ggplot2
:
Also, #TidyTuesday on Mastodon is a great source for further inspiration
R for Data Science book: Chapters 3: Data Visualisation and 28: Graphics for Communication, to get up and running quickly
ggplot2 book, for an in-depth understanding
Plotting anything with ggplot2 webinar with Thomas Lin Pederson (one of the main ggplot2 authors)
R graphics cookbook, a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly
Cedric Scherer’s ‘Engaging and Beautiful Data Visualizations with ggplot2’ workshop
Books about greating good data viz:
In RStudio, graphs are displayed in the Plots window. The plot is sized to fit the window and will be rescaled if the size of the window is changed.
Back and forward arrows allow you to navigate through graphs that have been plotted.
Plots can be saved in various formats using the Export drop down menu, which also has an option to copy to the clipboard.
DEMO
We’re just going to scratch the surface of this today.
We’ll be using the gt and gtsummary packages, but there are many of other.
Here’s a good overview of many different packages.
gt is an R package to create tables. It provides a grammar of tables.
The gt philosophy: we can construct a wide variety of useful tables with a cohesive set of table parts. It all begins with table data (be it a tibble or a data frame). You then decide how to compose your gt table with the elements and formatting you need for the task at hand. Finally, the table is rendered by printing it at the console, including it in an R Markdown document, or exporting to a file using gtsave()
See the article Case Study: gtcars for a thorough example of gt’s capabilities.
See also Albert Rapp’s book on gt.
Having the technical know-how to code tables is one thing, making them look good and such that the reader can easily read the data is another!
Highly recommend this Tom Mock guide, based on Jon Schwabish’s original. It covers guidelines for making better tables, and shows how to implement them in gt. It demonstrates even more of what gt can do than the article on the previous slide.
gtsummary extends the gt package and is used for summarising tables and working with statistical model summaries.
# summarize and augment the data
summary_table <-
tbl_summary(
trial2,
by = trt, # split table by group
missing = "no" # don't list missing data separately
) |>
add_n() |> # add column with total number of non-missing observations
add_p() |> # test for a difference between groups
modify_header(label = "**Variable**") |> # update the column header
bold_labels()
Variable | N | Drug A, N = 981 | Drug B, N = 1021 | p-value2 |
---|---|---|---|---|
Age | 189 | 46 (37, 59) | 48 (39, 56) | 0.7 |
Grade | 200 | 0.9 | ||
I | 35 (36%) | 33 (32%) | ||
II | 32 (33%) | 36 (35%) | ||
III | 31 (32%) | 33 (32%) | ||
Tumor Response | 193 | 28 (29%) | 33 (34%) | 0.5 |
1 Median (IQR); n (%) | ||||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test |
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Chemotherapy Treatment | |||
Drug A | — | — | |
Drug B | 1.13 | 0.60, 2.13 | 0.7 |
Age | 1.02 | 1.00, 1.04 | 0.10 |
Grade | |||
I | — | — | |
II | 0.85 | 0.39, 1.85 | 0.7 |
III | 1.01 | 0.47, 2.15 | >0.9 |
1 OR = Odds Ratio, CI = Confidence Interval |
The winners of the RStudio Table Contest
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).