Projects in R

Data visualization with the tidyverse

Christian Althaus, Judith Bouman, Martin Wohlfender

Fundamentals

Claus Wilke, a professor of integrative biology at The University of Texas at Austin, wrote this book as a guide to

  • making visualizations that accurately reflect the data,
  • tell a story,
  • and look professional.

Note that the entire book was written in R Markdown using RStudio!

Ugly, bad, and wrong figures

  • ugly - A figure that has aesthetic problems but otherwise is clear and informative.
  • bad - A figure that has problems related to perception; it may be unclear, confusing, overly complicated, or deceiving.
  • wrong — A figure that has problems related to mathematics; it is objectively incorrect.

Aesthetics

All data visualizations map data values into quantifiable features of the resulting graphic. We refer to these features as aesthetics.

Coordinate systems

Coordinate systems don’t have to be Cartesian.

Color scales

There are three fundamental use cases for color in data visualizations:

  1. We can use color to distinguish groups of data from each other.
  2. We can use color to represent data values.
  3. We can use color to highlight.

The types of colors we use and the way in which we use them are quite different for these three cases.

Color as a tool to distinguish

Color to represent data values

Color as a tool to highlight

ColorBrewer

Cynthia Brewer, a cartographer at Pennsylvania State University, designed the widely used color schemes ColorBrewer.

You can use the interactive web tool ColorBrewer 2.0 to choose an appropriate color scheme for your needs.

To use these color schemes in R, install the package RColorBrewer.

Colorblind safe figures

If you are not suffering from a color vision deficiency, it is very hard to imagine how it looks like to be colorblind.

The Color Blindness Simulator can close this gap for you. Just play around with it check whether your figures are colorblind safe.

unibeCols

The University of Bern has a set of corporate design colors that are defined in the manual “Gestaltungselemente”.

Thanks to Alan, you can easily install this color scheme with the unibeCols package: https://github.com/CTU-Bern/unibeCols

Visualizing (many) distributions

Visualizing (many) distributions

Visualizing geospatial data

Visualizing uncertainty

Visualizing uncertainty

Whenever you visualize uncertainty with error bars, you must specify what quantity and/or confidence level the error bars represent.

The principle of proportional ink

When a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value. - Bergstrom & West

The principle of proportional ink

When a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value. - Bergstrom & West

Handling overlapping data points

Handling overlapping data points

Don’t go 3D

Even though the 3D visualizations are shown from four different perspectives, it is difficult to envision how exactly the points are distributed in space.

Don’t go 3D

Instead, map one of the variables (in this case fuel efficiency) onto another aesthetic (size of the dots).

Commonly used image file formats

Acronym Name Type Application
pdf Portable Document Format vector general purpose
eps Encapsulated PostScript vector general purpose, outdated; use pdf
svg Scalable Vector Graphics vector online use
png Portable Network Graphics bitmap optimized for line drawings
jpeg Joint Photographic Experts Group bitmap optimized for photographic images
tiff Tagged Image File Format bitmap print production, accurate color reproduction
raw Raw Image File bitmap digital photography, needs post-processing
gif Graphics Interchange Format bitmap outdated for static figures, Ok for animations

Base plot vs. ggplot

plot(mtcars$disp, mtcars$hp,
     xlab = "displacement (cu. in.)",
     ylab = "power (hp)",
     main = "Scatter plot in base plot")

library(ggplot2)

ggplot(mtcars, aes(x = disp, y = hp)) +
    geom_point() +
    xlab("displacement (cu. in.)") +
    ylab("power (hp)") +
    ggtitle("Scatter plot in ggplot")

Data exploration and visualization with ggplot2

Artwork by @allison_horst

Program for the rest of the afternoon

  • General idea of using ggplot2
  • Basic graphs: geom_point, geom_line and geom_col
  • Fancify basic graphs: colors, legend, axes, theme and patchwork
  • Other types of geom: histogram, density, violin, boxplot

Data visualization with ggplot2

Based on the grammar of graphics, a conceptual approach to building graphs from layers.

Pass a dataframe, map variables to aesthetics (e.g. y, x, colour), tell it which geometry to use (e.g. point, line)

2023 - R for the Rest of Us

Types of layers

  • Geometries: Representation of data
  • Scales: Defining axes and legends
  • Labels: Adding descriptive text, e.g, axes labels
  • Themes: General appearance of plot

Example

Cheatsheet

Cheatsheet

Example data COVID-19

# load libraries
library(dplyr)
library(lubridate)
library(readr)

# read data
covid <- read_csv("data/raw/COVID19Cases_geoRegion.csv")

covid
# A tibble: 30,247 × 36
   geoRegion datum      entries sumTotal timeframe_14d timeframe_all
   <chr>     <date>       <dbl>    <dbl> <lgl>         <lgl>        
 1 CH        2020-02-24       1        1 FALSE         TRUE         
 2 CH        2020-02-25       1        2 FALSE         TRUE         
 3 CH        2020-02-26      10       12 FALSE         TRUE         
 4 CH        2020-02-27      10       22 FALSE         TRUE         
 5 CH        2020-02-28      10       32 FALSE         TRUE         
 6 CH        2020-02-29      13       45 FALSE         TRUE         
 7 CH        2020-03-01      12       57 FALSE         TRUE         
 8 CH        2020-03-02      30       87 FALSE         TRUE         
 9 CH        2020-03-03      33      120 FALSE         TRUE         
10 CH        2020-03-04      61      181 FALSE         TRUE         
# ℹ 30,237 more rows
# ℹ 30 more variables: offset_last7d <dbl>, sumTotal_last7d <dbl>,
#   offset_last14d <dbl>, sumTotal_last14d <dbl>, offset_last28d <dbl>,
#   sumTotal_last28d <dbl>, sum7d <dbl>, sum14d <dbl>, mean7d <dbl>,
#   mean14d <dbl>, entries_diff_last_age <dbl>, pop <dbl>, inz_entries <dbl>,
#   inzsumTotal <dbl>, inzmean7d <dbl>, inzmean14d <dbl>,
#   inzsumTotal_last7d <lgl>, inzsumTotal_last14d <lgl>, …

Data from the COVID-19 BAG dashboard: https://www.covid19.admin.ch/

dataframe setup: covid_cantons_2020

# filter data frame covid: 
# only keep confirmed cases in the cantons of Zurich, Bern and Vaud 
# in the first half of the year 2020
covid_cantons_2020 <- covid %>% filter(datum <= ymd("2020-06-30") 
                    & (geoRegion == "ZH" | geoRegion == "BE" | geoRegion == "VD"))

# write data frame covid_cantons_2020 to a csv file
write_csv(x = covid_cantons_2020, file = "data/processed/covid_cantons_2020_06.csv")
# A tibble: 384 × 36
   geoRegion datum      entries sumTotal timeframe_14d timeframe_all
   <chr>     <date>       <dbl>    <dbl> <lgl>         <lgl>        
 1 BE        2020-02-24       0        0 FALSE         TRUE         
 2 BE        2020-02-25       0        0 FALSE         TRUE         
 3 BE        2020-02-26       0        0 FALSE         TRUE         
 4 BE        2020-02-27       1        1 FALSE         TRUE         
 5 BE        2020-02-28       0        1 FALSE         TRUE         
 6 BE        2020-02-29       1        2 FALSE         TRUE         
 7 BE        2020-03-01       0        2 FALSE         TRUE         
 8 BE        2020-03-02       4        6 FALSE         TRUE         
 9 BE        2020-03-03       3        9 FALSE         TRUE         
10 BE        2020-03-04       8       17 FALSE         TRUE         
# ℹ 374 more rows
# ℹ 30 more variables: offset_last7d <dbl>, sumTotal_last7d <dbl>,
#   offset_last14d <dbl>, sumTotal_last14d <dbl>, offset_last28d <dbl>,
#   sumTotal_last28d <dbl>, sum7d <dbl>, sum14d <dbl>, mean7d <dbl>,
#   mean14d <dbl>, entries_diff_last_age <dbl>, pop <dbl>, inz_entries <dbl>,
#   inzsumTotal <dbl>, inzmean7d <dbl>, inzmean14d <dbl>,
#   inzsumTotal_last7d <lgl>, inzsumTotal_last14d <lgl>, …

Goal 1 (exercise 4)

geom_point: basic plot

# load library
library(ggplot2)

plot_covid_point_v0 <- ggplot(data = covid_cantons_2020, 
                              mapping = aes(x = datum, y = entries)) + 
  geom_point()

Note: does not use the %>% or |> pipes, it uses + instead…

geom_line: basic plot

plot_covid_line_v0 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion))

geom_col: basic plot

plot_covid_col_v0 <- ggplot(data = covid_cantons_2020, 
                            mapping = aes(x = datum, y = entries)) + 
  geom_col(position = "stack")

Exercise 4A: basic plot

  1. Read Ebola data and sort it by date.
  2. Determine what variables you need to include in your dataframe to make the type of plot shown below.
  3. Create a dataframe with the required variables and all data for 3 countries before 31 March 2015.

Exercise 4A: solution

# load libraries
library(dplyr)
library(lubridate)
library(readr)

# read Ebola data
data_ebola <- read_csv("data/raw/ebola.csv")

# sort data_ebola by date
data_ebola <- data_ebola %>% arrange(Date)

data_ebola
# A tibble: 2,484 × 6
    ...1 Country      Date       Cum_conf_cases Cum_susp_cases Cum_conf_death
   <dbl> <chr>        <date>              <dbl>          <dbl>          <dbl>
 1   641 Guinea       2014-08-29            482             25            287
 2   642 Liberia      2014-08-29            322            382            225
 3   643 Sierra Leone 2014-08-29            935             54            380
 4   644 Nigeria      2014-08-29             15              3              6
 5   636 Guinea       2014-09-05            604             56            362
 6   637 Liberia      2014-09-05            614            369            431
 7   638 Sierra Leone 2014-09-05           1146             78            443
 8   639 Nigeria      2014-09-05             18              3              7
 9   640 Senegal      2014-09-05              1             NA              0
10   631 Guinea       2014-09-08            664             47            400
# ℹ 2,474 more rows

Exercise 4A: solution

2023 - R for the Rest of Us

# filter data_ebola: cumulative number of confirmed cases in Guinea, 
# Liberia and Sierra Leone before 31 March 2015 
data_ebola_cum_cases <- data_ebola %>% 
  select(date = Date, country = Country, cum_conf_cases = Cum_conf_cases) %>% 
  filter(date <= ymd("2015-03-31") & 
        (country == "Guinea" | country ==  "Liberia" | country == "Sierra Leone"))

Exercise 4B: basic plot

Create basic point, line and column plots of the cumulative number of confirmed cases versus time.

Exercise 4B: solution

# load library
library(ggplot2)

# crete point plot
plot_ebola_point_v0 <- ggplot(data = data_ebola_cum_cases, 
                              mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_point()
  
# create line plot
plot_ebola_line_v0 <- ggplot(data = data_ebola_cum_cases, 
                             mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_line(aes(group = country))

# create column plot
plot_ebola_col_v0 <- ggplot(data = data_ebola_cum_cases, 
                            mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_col(position = "stack")

Exercise 4B: solution

ggsave: saving your plot

# Save the plot as a PNG using ggsave
ggsave("plot_covid_point_goal.png", plot = plot_covid_point_goal, width = 8, height = 6, units = "in", dpi = 300)

# Save the plot as a PDF using ggsave
ggsave("plot_covid_point_goal.pdf", plot = plot_covid_point_goal, width = 8, height = 6)

ggsave: saving your plot

# Save the plot as a PNG using ggsave
ggsave("plot_covid_point_goal.png", plot = plot_covid_point_goal, width = 8, height = 6, units = "in", dpi = 300)

# Save the plot as a PDF using ggsave
ggsave("plot_covid_point_goal.pdf", plot = plot_covid_point_goal, width = 8, height = 6)

Try this for your own plot.

geom_point: colour and fill

plot_covid_point_v1 <- ggplot(data = covid_cantons_2020, 
                              mapping = aes(x = datum, y = entries)) + 
  geom_point(alpha = 0.7, colour = "black", fill = "blue", 
             shape = 21, size = 1.5, stroke = 1.5)

geom_line: colour and fill

plot_covid_line_v1 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion), 
            alpha = 0.7, colour = "blue", linetype = "solid", linewidth = 1.5)

geom_col: colour and fill

plot_covid_col_v1 <- ggplot(data = covid_cantons_2020, 
                            mapping = aes(x = datum, y = entries)) + 
  geom_col(position = "stack", alpha = 0.7, fill = "blue", 
           linetype = "solid", linewidth = 0.5, width = 0.7)

Exercise 4C: colour and fill

Change global aesthetics of the 3 plots you created in Exercise 4B.

  1. Point plot: Try different values for alpha, colour, fill, shape, size and stroke.
  2. Line plot: Try different values for alpha, colour, linetype and linewidth.
  3. Column plot: Try different values for alpha, colour, fill, linetype, linewidth, position and width.

Exercise 4C: solution

# create point plot
plot_ebola_point_v1 <- ggplot(data = data_ebola_cum_cases, 
                              mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_point(alpha = 0.7, colour = "blue", fill = "green", 
             shape = 22, size = 1.5, stroke = 1.5) 

# create line plot
plot_ebola_line_v1 <- ggplot(data = data_ebola_cum_cases, 
                             mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, colour = "blue", linetype = "dashed", linewidth = 1.5)

# create column plot
plot_ebola_col_v1 <- ggplot(data = data_ebola_cum_cases, 
                            mapping = aes(x = date, y = cum_conf_cases)) + 
  geom_col(alpha = 0.7, colour = "blue", fill = "green", 
           linetype = "solid", linewidth = 0.1, position = "stack", width = 0.7)

Exercise 4C: solution

geom_point: color per country

plot_covid_point_v2 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5)

Global vs. local aesthetics

ggplot(data = covid_cantons_2020, 
      mapping = aes(x = datum, y = entries, colour = geoRegion, 
                    fill = geoRegion, group_by = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  geom_line()

Global vs. local aesthetics

ggplot(data = covid_cantons_2020, 
                mapping = aes(x = datum, y = entries, group_by = geoRegion)) + 
  geom_point(alpha = 0.7, colour = "black", fill= "black", shape = 21, 
             size = 1.5, stroke = 1.5) +
  geom_line(colour = "red")

More examples on local vs. global aesthetics

geom_line: color per country

plot_covid_line_v2 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5)

geom_col: color per country

plot_covid_col_v2 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7, 
           linetype = "solid", linewidth = 0.5, width = 0.7)

Exercise 4D: color per country

Change aesthetic mappings of the 3 plots you created in Exercise 4C.

  1. Point plot: Set fill colour of points by country.
  2. Line plot: Set colour of lines by country.
  3. Column plot: Set fill colour of columns by country.

Exercise 4D: solution

# create point plot
plot_ebola_point_v2 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) 

# create line plot
plot_ebola_line_v2 <- ggplot(data = data_ebola_cum_cases, 
               mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5)

# create column plot
plot_ebola_col_v2 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7)

Exercise 4D: solution

geom_point: labels

plot_covid_point_v3 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_line: labels

plot_covid_line_v3 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_col: labels

plot_covid_col_v3 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7,
           linetype = "solid", linewidth = 0.5, width = 0.7) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

Exercise 4E: labels

Change the title and the labels of the axes of the 3 plots you created in Exercise 4D.

  1. Set the title to “Confirmed Ebola cases”.
  2. Set the label of x-axes to “Time”.
  3. Set the label of y-axes to “Cum. # of confirmed cases”.

Exercise 4E: solution

# create point plot
plot_ebola_point_v3 <- ggplot(data = data_ebola_cum_cases, 
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create line plot
plot_ebola_line_v3 <- ggplot(data = data_ebola_cum_cases, 
               mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create column plot
plot_ebola_col_v3 <- ggplot(data = data_ebola_cum_cases, 
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

Exercise 4E: solution

geom_point: change standard colors

library(unibeCols)

plot_covid_point_v4 <- ggplot(data = covid_cantons_2020, 
    mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
    scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_point: change standard colors

geom_line: change standard colors

plot_covid_line_v4 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5) +
  scale_colour_manual(name = "Canton",
                      breaks = c("BE", "VD", "ZH"),
                      values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                      labels = c("Bern", "Vaud", "Zurich")) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_line: change standard colors

geom_col: change standard colors

plot_covid_col_v4 <- ggplot(data = covid_cantons_2020, 
                            mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7,
           linetype = "solid", linewidth = 0.5, width = 0.7) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_colour_manual(name = "Canton",
                      breaks = c("BE", "VD", "ZH"),
                      values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                      labels = c("Bern", "Vaud", "Zurich")) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_col: change standard colors

Exercise 4F

Change the colour, respectively fill, scale of the three plots you created in Exercise 4E.

  1. Point plot: Change fill scale manually.
  2. Line plot: Change colour scale manually.
  3. Column plot: Change fill scale manually.

Exercise 4F: solution

# load library
library(unibeCols)

# create point plot
plot_ebola_point_v4 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  ggtitle(label = "Confirmed Ebola") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create line plot
plot_ebola_line_v4 <- ggplot(data = data_ebola_cum_cases, 
               mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
  scale_colour_manual(name = "Country",
                      breaks = c("Guinea", "Liberia", "Sierra Leone"),
                      values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                      labels = c("GIN", "LBR", "SLE")) +
  ggtitle(label = "Confirmed Ebola") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create column plot
plot_ebola_col_v4 <- ggplot(data = data_ebola_cum_cases, 
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

Exercise 4F: solution

geom_point: scales

plot_covid_point_v5 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", 
                                  "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_point: scales

geom_line: scales

plot_covid_line_v5 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5) +
  scale_colour_manual(name = "Canton",
                      breaks = c("BE", "VD", "ZH"),
                      values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                      labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_line: scales

geom_col: scales

plot_covid_col_v5 <- ggplot(data = covid_cantons_2020, 
      mapping = aes(x = datum, y = entries, fill = geoRegion, group=geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7,
           linetype = "solid", linewidth = 0.5, width = 0.7) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
                     limits = c(0, 600)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases")

geom_col: scales

Exercise 4G: scales

Change the scale of the axes of the three plots you created in Exercise 5.

  1. Point plot: Change breaks of x-axes to 29 August, 1 October, 1 December, 1 February, and 1 April.
  2. Line plot: Change breaks of y-axes of point and line plot to 0, 2500, 5000, 7500 and 10000.
  3. Column plot: Change breaks of y-axis of column plot to 0, 2500, 5000, 7500, 10000, 12500 and 15000.

Exercise 4G: solution

# create point plot
plot_ebola_point_v5 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_point(alpha = 0.7, 
             shape = 22, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
    scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create line plot
plot_ebola_line_v5 <- ggplot(data = data_ebola_cum_cases, 
                             mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
  scale_colour_manual(name = "Country",
                      breaks = c("Guinea", "Liberia", "Sierra Leone"),
                      values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                      labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

# create column plot
plot_ebola_col_v5 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
    scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
                     limits = c(0, 15000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases")

Exercise 4G: solution

Exercise 4G: solution

Exercise 4G: solution

Themes

Graphic from https://www.geeksforgeeks.org/themes-and-background-colors-in-ggplot2-in-r/

geom_point: themes

plot_covid_point_v6 <- ggplot(data = covid_cantons_2020, 
  mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
    scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

geom_point: themes

geom_line: themes

plot_covid_line_v6 <- ggplot(data = covid_cantons_2020, 
                             mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5) +
  scale_colour_manual(name = "Canton",
                      breaks = c("BE", "VD", "ZH"),
                      values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                      labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

geom_line: themes

geom_col: themes

plot_covid_col_v6 <- ggplot(data = covid_cantons_2020, 
    mapping = aes(x = datum, y = entries, fill = geoRegion, colour=geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7,
           linetype = "solid", linewidth = 0.5, width = 0.7) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
                     limits = c(0, 600)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

geom_col: themes

Exercise 4H: themes

Change the theme of the three plots you created in Exercise 4G to theme_bw().

Exercise 4H: solution

# create point plot
plot_ebola_point_v6 <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
    scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

# create line plot
plot_ebola_line_v6 <- ggplot(data = data_ebola_cum_cases, 
                             mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
  scale_colour_manual(name = "Country",
                      breaks = c("Guinea", "Liberia", "Sierra Leone"),
                      values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                      labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

# create column plot
plot_ebola_col_v6 <- ggplot(data = data_ebola_cum_cases, 
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
                     limits = c(0, 15000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom")

Exercise 4H: solution

Exercise 4H: solution

Exercise 4H: solution

geom_point: facet

plot_covid_point_facet <- ggplot(data = covid_cantons_2020, 
   mapping = aes(x = datum, y = entries, fill = geoRegion, colour=geoRegion)) + 
  geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
    scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  theme(panel.spacing = unit(2, "lines")) +
  facet_grid(cols = vars(geoRegion))

geom_point: facet

geom_line: facet

plot_covid_line_facet <- ggplot(data = covid_cantons_2020, 
                                mapping = aes(x = datum, y = entries)) + 
  geom_line(mapping = aes(group = geoRegion, colour = geoRegion), 
            alpha = 0.7, linetype = "solid", linewidth = 1.5) +
  scale_colour_manual(name = "Canton",
                      breaks = c("BE", "VD", "ZH"),
                      values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                      labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
                     limits = c(0, 350)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  theme(panel.spacing = unit(2, "lines")) +
  facet_grid(cols = vars(geoRegion))

geom_line: facet

geom_col: facet

plot_covid_col_facet <- ggplot(data = covid_cantons_2020, 
 mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) + 
  geom_col(position = "stack", alpha = 0.7,
           linetype = "solid", linewidth = 0.5, width = 0.7) +
  scale_fill_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_colour_manual(name = "Canton",
                    breaks = c("BE", "VD", "ZH"),
                    values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
                    labels = c("Bern", "Vaud", "Zurich")) +
  scale_x_date(breaks = ymd(c("2020-02-24", "2020-04-01", "2020-05-01", 
                                  "2020-06-01","2020-07-01")),
               labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
               limits = ymd(c("2020-02-23", "2020-07-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
                     limits = c(0, 600)) +
  ggtitle(label = "Confirmed covid cases in 3 cantons") +
  xlab(label = "Time") +
  ylab(label = "# of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  theme(panel.spacing = unit(2, "lines")) +
  facet_grid(cols = vars(geoRegion))

geom_col: facet

Exercise 4I: facet

Create facet grids by country from the three plots you created in Exercise 4H.

Exercise 4I: solution

# create point plot
plot_ebola_point_facet <- ggplot(data = data_ebola_cum_cases, 
  mapping = aes(x = date, y = cum_conf_cases, colour = country,fill = country)) + 
  geom_point(alpha = 0.7,  
             shape = 22, size = 1.5, stroke = 1.5) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", 
                                  "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  facet_grid(cols = vars(country))

# create line plot
plot_ebola_line_facet <- ggplot(data = data_ebola_cum_cases, 
                   mapping = aes(x = date, y = cum_conf_cases, colour = country)) + 
  geom_line(mapping = aes(group = country), 
            alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
  scale_colour_manual(name = "Country",
                      breaks = c("Guinea", "Liberia", "Sierra Leone"),
                      values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                      labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
                     limits = c(0, 10000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  facet_grid(cols = vars(country))

# create column plot
plot_ebola_col_facet <- ggplot(data = data_ebola_cum_cases, 
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) + 
  geom_col(alpha = 0.7, linetype = "solid", 
           linewidth = 0.1, position = "stack", width = 0.7) +
  scale_fill_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_colour_manual(name = "Country",
                    breaks = c("Guinea", "Liberia", "Sierra Leone"),
                    values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
                    labels = c("GIN", "LBR", "SLE")) +
  scale_x_date(breaks = ymd(c("2014-08-29", "2014-10-01", "2014-12-01", 
                                  "2015-02-01", "2015-04-01")),
               labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
               limits = ymd(c("2014-08-28", "2015-04-01"))) +
  scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
                     limits = c(0, 15000)) +
  ggtitle(label = "Confirmed Ebola cases") +
  xlab(label = "Time") +
  ylab(label = "Cum. # of confirmed cases") +
  theme_bw() + theme(legend.position="bottom") +
  facet_grid(cols = vars(country))

Exercise 4I: solution

Exercise 4I: solution

Exercise 4I: solution

Patchwork

Artwork by @allison_horst

geom_point: grid

library(cowplot)

plot_covid_point_grid <- plot_grid(plotlist = list(plot_covid_point_v1, plot_covid_point_v2, plot_covid_point_v3, 
                                                   plot_covid_point_v4, plot_covid_point_v5, plot_covid_point_v6),
                                   labels = c("V1", "V2", "V3", "V4", "V5", "V6"), label_size = 12, nrow = 2)

Install cowplot:

install.packages("cowplot")

geom_point: grid

geom_line: grid

plot_covid_line_grid <- plot_grid(plotlist = list(plot_covid_line_v1, plot_covid_line_v2, plot_covid_line_v3, 
                                                  plot_covid_line_v4, plot_covid_line_v5, plot_covid_line_v6),
                                  labels = c("V1", "V2", "V3", "V4", "V5", "V6"), label_size = 12, nrow = 2)

geom_line: grid

geom_col: grid

plot_covid_col_grid <- plot_grid(plotlist = list(plot_covid_col_v1, plot_covid_col_v2, plot_covid_col_v3, 
                                                 plot_covid_col_v4, plot_covid_col_v5, plot_covid_col_v6),
                                 labels = c("V1", "V2", "V3", "V4", "V5", "V6"), label_size = 12, nrow = 2)

geom_col: grid

Exercise 4J: grid

Arrange six of the plots you created in the previous exercises into a grid.

Exercise 4J: solution

plot_ebola_line_grid <- plot_grid(plotlist = list(plot_ebola_line_v1, plot_ebola_line_v2, plot_ebola_line_v3, 
                                                  plot_ebola_line_v4, plot_ebola_line_v5, plot_ebola_line_v6),
                                  labels = c("V1", "V2", "V3", "V4", "V5", "V6"), label_size = 12, nrow = 2)

Exercise 4J: solution

Types of geom

Example data: insurance

insurance <- read_csv("data/raw/insurance_with_date.csv")
insurance <- insurance %>% mutate(children = as.factor(children))

insurance
# A tibble: 1,338 × 9
       X   age sex      bmi children smoker region    charges date      
   <dbl> <dbl> <chr>  <dbl> <fct>    <chr>  <chr>       <dbl> <date>    
 1     1    59 male    31.8 2        no     southeast  13086. 2001-01-15
 2     2    24 female  22.6 0        no     southwest   2574. 2001-01-17
 3     3    28 female  25.9 1        no     northwest   4411. 2001-01-22
 4     4    22 male    25.2 0        no     northwest   2321. 2001-01-29
 5     5    60 female  36.0 0        no     northeast  13435. 2001-02-06
 6     6    38 female  28   3        no     southwest   7263. 2001-02-17
 7     7    51 female  20.6 0        no     southwest   9435. 2001-02-25
 8     8    44 female  39.0 0        yes    northwest  43104. 2001-02-27
 9     9    47 male    36.2 1        no     southwest   8239. 2001-03-02
10    10    29 male    32.1 2        no     northwest   4714. 2001-03-05
# ℹ 1,328 more rows

Data adapted from “Machine Learning with R” by Brett Lantz.

Density plot / histogram

Exercise 5A: Can you reproduce these graphs using the insurance.csv dataset?

Density plot / histogram – solution 1

ggplot( insurance , aes(x = bmi, colour = sex, fill = sex ) ) + 
  geom_density( alpha = 0.4 ) +
  theme(text = element_text(size=20), legend.position = "bottom") +
  xlab( expression(paste( "BMI (kg/", m^2,")")) ) + 
  scale_colour_manual(name = "" , values=c("female"=unibePastelS()[1],
                               "male"=unibeIceS()[1]), labels = c("Female", "Male")) +
  scale_fill_manual(name = "", values=c("female"=unibePastelS()[1],
                               "male"=unibeIceS()[1]), labels = c("Female", "Male")) 

Density plot / histogram – solution 2

ggplot( insurance ) + 
  geom_histogram( aes(x = charges, y = after_stat(density), colour = sex, fill = sex ),
                  alpha = 0.4, bins = 100 ) +
  geom_density( aes(x = charges, colour = sex), linewidth = 1.5 ) +
  theme(text = element_text(size=20), legend.position = "top") +
  xlab( "Charges in Dollar" ) + 
  scale_colour_manual(name = "" , values=c("female"=unibePastelS()[1],
                               "male"=unibeIceS()[1]), labels = c("Female", "Male")) +
  scale_fill_manual(name = "", values=c("female"=unibePastelS()[1],
                               "male"=unibeIceS()[1]), labels = c("Female", "Male")) +
    geom_vline(aes(xintercept = median(charges)), color = unibeRedS()[1], linewidth = 1)

Quantiles

Excersize 5B: Can you reproduce this graph using the insurance.csv dataset?

Quantiles – solution

ggplot( insurance , aes(x = age, y = bmi, color =smoker) ) + 
  geom_point(  ) +
  geom_quantile(  ) +
  theme(text = element_text(size=20), legend.position = "top") +
  xlab( "Age (years)" ) + ylab( expression(paste( "BMI (kg/", m^2,")")) ) + 
  scale_colour_manual(name = "" , values=c("no"=unibeRedS()[1],
                               "yes"=unibeIceS()[1]), labels = c("No", "Yes")) +
  scale_fill_manual(name = "" , values=c("no"=unibeRedS()[1],
                               "yes"=unibeIceS()[1]), labels = c("No", "Yes")) 

violin plot / boxplot

Excersize 5C: Can you reproduce these graphs using the insurance.csv dataset?

violin plot / boxplot – solution 1

ggplot( insurance , aes(x = smoker, y = charges ) ) + 
  ylab( "Charges ($)" ) +
  geom_violin(  )

violin plot / boxplot – solution 2

ggplot( insurance , aes(x = smoker, y = charges ) ) + 
  geom_boxplot(  ) + 
  ylab( "Charges ($)" ) + 
  coord_flip()

Cheatsheet

Cheatsheet

Practice makes perfect

Community driven projects for practicing

Images by @tanyashapiro, @gkaramanis, @cscherer