plot(mtcars$disp, mtcars$hp,
xlab = "displacement (cu. in.)",
ylab = "power (hp)",
main = "Scatter plot in base plot")
Data visualization with the tidyverse
Claus Wilke, a professor of integrative biology at The University of Texas at Austin, wrote this book as a guide to
Note that the entire book was written in R Markdown using RStudio!
All data visualizations map data values into quantifiable features of the resulting graphic. We refer to these features as aesthetics.
Coordinate systems don’t have to be Cartesian.
There are three fundamental use cases for color in data visualizations:
The types of colors we use and the way in which we use them are quite different for these three cases.
Cynthia Brewer, a cartographer at Pennsylvania State University, designed the widely used color schemes ColorBrewer.
You can use the interactive web tool ColorBrewer 2.0 to choose an appropriate color scheme for your needs.
To use these color schemes in R, install the package RColorBrewer
.
If you are not suffering from a color vision deficiency, it is very hard to imagine how it looks like to be colorblind.
The Color Blindness Simulator can close this gap for you. Just play around with it check whether your figures are colorblind safe.
The University of Bern has a set of corporate design colors that are defined in the manual “Gestaltungselemente”.
Thanks to Alan, you can easily install this color scheme with the unibeCols
package: https://github.com/CTU-Bern/unibeCols
Whenever you visualize uncertainty with error bars, you must specify what quantity and/or confidence level the error bars represent.
When a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value. - Bergstrom & West
When a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value. - Bergstrom & West
Even though the 3D visualizations are shown from four different perspectives, it is difficult to envision how exactly the points are distributed in space.
Instead, map one of the variables (in this case fuel efficiency) onto another aesthetic (size of the dots).
Acronym | Name | Type | Application |
---|---|---|---|
Portable Document Format | vector | general purpose | |
eps | Encapsulated PostScript | vector | general purpose, outdated; use pdf |
svg | Scalable Vector Graphics | vector | online use |
png | Portable Network Graphics | bitmap | optimized for line drawings |
jpeg | Joint Photographic Experts Group | bitmap | optimized for photographic images |
tiff | Tagged Image File Format | bitmap | print production, accurate color reproduction |
raw | Raw Image File | bitmap | digital photography, needs post-processing |
gif | Graphics Interchange Format | bitmap | outdated for static figures, Ok for animations |
ggplot2
Artwork by @allison_horst
ggplot2
Based on the grammar of graphics, a conceptual approach to building graphs from layers.
Pass a dataframe, map variables to aesthetics (e.g. y
, x
, colour
), tell it which geometry to use (e.g. point, line)
2023 - R for the Rest of Us
covid <- read.csv("data/raw/COVID19Cases_geoRegion.csv")
covid <- covid %>% mutate( datum = as.Date(datum) )
head(covid)
geoRegion datum entries sumTotal timeframe_14d timeframe_all
1 CH 2020-02-24 1 1 FALSE TRUE
2 CH 2020-02-25 1 2 FALSE TRUE
3 CH 2020-02-26 10 12 FALSE TRUE
4 CH 2020-02-27 10 22 FALSE TRUE
5 CH 2020-02-28 10 32 FALSE TRUE
6 CH 2020-02-29 13 45 FALSE TRUE
offset_last7d sumTotal_last7d offset_last14d sumTotal_last14d offset_last28d
1 4385008 0 4383801 0 4376250
2 4385008 0 4383801 0 4376250
3 4385008 0 4383801 0 4376250
4 4385008 0 4383801 0 4376250
5 4385008 0 4383801 0 4376250
6 4385008 0 4383801 0 4376250
sumTotal_last28d sum7d sum14d mean7d mean14d entries_diff_last_age pop
1 0 NA NA NA NA 7 8738791
2 0 NA NA NA NA 7 8738791
3 0 NA NA NA NA 7 8738791
4 0 NA NA 8.14 NA 7 8738791
5 0 NA NA 12.29 NA 7 8738791
6 0 NA NA 16.86 NA 7 8738791
inz_entries inzsumTotal inzmean7d inzmean14d inzsumTotal_last7d
1 0.01 0.01 NA NA NA
2 0.01 0.02 NA NA NA
3 0.11 0.14 NA NA NA
4 0.11 0.25 0.09 NA NA
5 0.11 0.37 0.14 NA NA
6 0.15 0.51 0.19 NA NA
inzsumTotal_last14d inzsumTotal_last28d inzsum7d inzsum14d sumdelta7d
1 NA NA NA NA NA
2 NA NA NA NA NA
3 NA NA NA NA NA
4 NA NA NA NA NA
5 NA NA NA NA NA
6 NA NA NA NA NA
inzdelta7d type type_variant version datum_unit
1 NA COVID19Cases NA 2023-01-24_06-03-16 day
2 NA COVID19Cases NA 2023-01-24_06-03-16 day
3 NA COVID19Cases NA 2023-01-24_06-03-16 day
4 NA COVID19Cases NA 2023-01-24_06-03-16 day
5 NA COVID19Cases NA 2023-01-24_06-03-16 day
6 NA COVID19Cases NA 2023-01-24_06-03-16 day
entries_letzter_stand entries_neu_gemeldet entries_diff_last
1 1 0 914
2 1 0 914
3 10 0 914
4 10 0 914
5 10 0 914
6 13 0 914
[1] 30247 36
Data from the COVID-19 BAG dashboard: https://www.covid19.admin.ch/
# filter data frame covid:
# only keep confirmed cases in the cantons of Zurich, Bern and Vaud
# in the first half of the year 2020
covid_cantons_2020 <- covid %>% filter(datum <= as.Date("2020-06-30")
& (geoRegion == "ZH" | geoRegion == "BE" | geoRegion == "VD"))
# write data frame covid_cantons_2020 to a csv file
write.csv(x = covid_cantons_2020, file = "data/processed/covid_cantons_2020_06.csv")
geoRegion datum entries sumTotal timeframe_14d timeframe_all
1 BE 2020-02-24 0 0 FALSE TRUE
2 BE 2020-02-25 0 0 FALSE TRUE
3 BE 2020-02-26 0 0 FALSE TRUE
4 BE 2020-02-27 1 1 FALSE TRUE
5 BE 2020-02-28 0 1 FALSE TRUE
6 BE 2020-02-29 1 2 FALSE TRUE
offset_last7d sumTotal_last7d offset_last14d sumTotal_last14d offset_last28d
1 507985 0 507871 0 507046
2 507985 0 507871 0 507046
3 507985 0 507871 0 507046
4 507985 0 507871 0 507046
5 507985 0 507871 0 507046
6 507985 0 507871 0 507046
sumTotal_last28d sum7d sum14d mean7d mean14d entries_diff_last_age pop
1 0 NA NA NA NA 7 1047473
2 0 NA NA NA NA 7 1047473
3 0 NA NA NA NA 7 1047473
4 0 NA NA 0.29 NA 7 1047473
5 0 NA NA 0.86 NA 7 1047473
6 0 NA NA 1.29 NA 7 1047473
inz_entries inzsumTotal inzmean7d inzmean14d inzsumTotal_last7d
1 0.0 0.00 NA NA NA
2 0.0 0.00 NA NA NA
3 0.0 0.00 NA NA NA
4 0.1 0.10 0.03 NA NA
5 0.0 0.10 0.08 NA NA
6 0.1 0.19 0.12 NA NA
inzsumTotal_last14d inzsumTotal_last28d inzsum7d inzsum14d sumdelta7d
1 NA NA NA NA NA
2 NA NA NA NA NA
3 NA NA NA NA NA
4 NA NA NA NA NA
5 NA NA NA NA NA
6 NA NA NA NA NA
inzdelta7d type type_variant version datum_unit
1 NA COVID19Cases NA 2023-01-24_06-03-16 day
2 NA COVID19Cases NA 2023-01-24_06-03-16 day
3 NA COVID19Cases NA 2023-01-24_06-03-16 day
4 NA COVID19Cases NA 2023-01-24_06-03-16 day
5 NA COVID19Cases NA 2023-01-24_06-03-16 day
6 NA COVID19Cases NA 2023-01-24_06-03-16 day
entries_letzter_stand entries_neu_gemeldet entries_diff_last
1 0 0 75
2 0 0 75
3 0 0 75
4 1 0 75
5 0 0 75
6 1 0 75
Note: does not use the %>%
or |>
pipes, it uses +
instead…
# load library
library(dplyr)
# read Ebola data
data_ebola <- read.csv("data/raw/ebola.csv")
# format column datum of data_ebola as date
data_ebola$Date <- as.Date(data_ebola$Date)
# sort data_ebola by date
data_ebola <- arrange(data_ebola, Date)
head(data_ebola)
X Country Date Cum_conf_cases Cum_susp_cases Cum_conf_death
1 641 Guinea 2014-08-29 482 25 287
2 642 Liberia 2014-08-29 322 382 225
3 643 Sierra Leone 2014-08-29 935 54 380
4 644 Nigeria 2014-08-29 15 3 6
5 636 Guinea 2014-09-05 604 56 362
6 637 Liberia 2014-09-05 614 369 431
[1] 2484 6
2023 - R for the Rest of Us
# filter data_ebola: cumulative number of confirmed cases in Guinea,
# Liberia and Sierra Leone before 31 March 2015
data_ebola_cum_cases <- data_ebola %>%
select(date = Date, country = Country, cum_conf_cases = Cum_conf_cases) %>%
filter(date <= as.Date("2015-03-31") &
(country == "Guinea" | country == "Liberia" | country == "Sierra Leone"))
Create basic point, line and column plots of the cumulative number of confirmed cases versus time.
# crete point plot
plot_ebola_point_v0 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_point()
# create line plot
plot_ebola_line_v0 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_line(aes(group = country))
# create column plot
plot_ebola_col_v0 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_col(position = "stack")
Change global aesthetics of the 3 plots you created in Exercise 4B.
# create point plot
plot_ebola_point_v1 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_point(alpha = 0.7, colour = "blue", fill = "green",
shape = 22, size = 1.5, stroke = 1.5)
# create line plot
plot_ebola_line_v1 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, colour = "blue", linetype = "dashed", linewidth = 1.5)
# create column plot
plot_ebola_col_v1 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases)) +
geom_col(alpha = 0.7, colour = "blue", fill = "green",
linetype = "solid", linewidth = 0.1, position = "stack", width = 0.7)
Change aesthetic mappings of the 3 plots you created in Exercise 4C.
# create point plot
plot_ebola_point_v2 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5)
# create line plot
plot_ebola_line_v2 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5)
# create column plot
plot_ebola_col_v2 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7)
plot_covid_point_v3 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_line_v3 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries)) +
geom_line(mapping = aes(group = geoRegion, colour = geoRegion),
alpha = 0.7, linetype = "solid", linewidth = 1.5) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_col_v3 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_col(position = "stack", alpha = 0.7,
linetype = "solid", linewidth = 0.5, width = 0.7) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
Change the title and the labels of the axes of the 3 plots you created in Exercise 4D.
# create point plot
plot_ebola_point_v3 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create line plot
plot_ebola_line_v3 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create column plot
plot_ebola_col_v3 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
library(unibeCols)
plot_covid_point_v4 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_line_v4 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries)) +
geom_line(mapping = aes(group = geoRegion, colour = geoRegion),
alpha = 0.7, linetype = "solid", linewidth = 1.5) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_col_v4 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_col(position = "stack", alpha = 0.7,
linetype = "solid", linewidth = 0.5, width = 0.7) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
Change the colour, respectively fill, scale of the three plots you created in Exercise 4E.
# create point plot
plot_ebola_point_v4 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
ggtitle(label = "Confirmed Ebola") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create line plot
plot_ebola_line_v4 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
ggtitle(label = "Confirmed Ebola") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create column plot
plot_ebola_col_v4 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
plot_covid_point_v5 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01",
"2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_line_v5 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries)) +
geom_line(mapping = aes(group = geoRegion, colour = geoRegion),
alpha = 0.7, linetype = "solid", linewidth = 1.5) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
plot_covid_col_v5 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, group=geoRegion)) +
geom_col(position = "stack", alpha = 0.7,
linetype = "solid", linewidth = 0.5, width = 0.7) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
limits = c(0, 600)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases")
Change the scale of the axes of the three plots you created in Exercise 5.
# create point plot
plot_ebola_point_v5 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_point(alpha = 0.7,
shape = 22, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create line plot
plot_ebola_line_v5 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
# create column plot
plot_ebola_col_v5 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
limits = c(0, 15000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases")
Graphic from https://www.geeksforgeeks.org/themes-and-background-colors-in-ggplot2-in-r/
plot_covid_point_v6 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
plot_covid_line_v6 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries)) +
geom_line(mapping = aes(group = geoRegion, colour = geoRegion),
alpha = 0.7, linetype = "solid", linewidth = 1.5) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
plot_covid_col_v6 <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour=geoRegion)) +
geom_col(position = "stack", alpha = 0.7,
linetype = "solid", linewidth = 0.5, width = 0.7) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
limits = c(0, 600)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
Change the theme of the three plots you created in Exercise 4G to theme_bw().
# create point plot
plot_ebola_point_v6 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_point(alpha = 0.7, shape = 22, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
# create line plot
plot_ebola_line_v6 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
# create column plot
plot_ebola_col_v6 <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
limits = c(0, 15000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom")
plot_covid_point_facet <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour=geoRegion)) +
geom_point(alpha = 0.7, shape = 21, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
theme(panel.spacing = unit(2, "lines")) +
facet_grid(cols = vars(geoRegion))
plot_covid_line_facet <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries)) +
geom_line(mapping = aes(group = geoRegion, colour = geoRegion),
alpha = 0.7, linetype = "solid", linewidth = 1.5) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01", "2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 350, by = 50),
limits = c(0, 350)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
theme(panel.spacing = unit(2, "lines")) +
facet_grid(cols = vars(geoRegion))
plot_covid_col_facet <- ggplot(data = covid_cantons_2020,
mapping = aes(x = datum, y = entries, fill = geoRegion, colour = geoRegion)) +
geom_col(position = "stack", alpha = 0.7,
linetype = "solid", linewidth = 0.5, width = 0.7) +
scale_fill_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_colour_manual(name = "Canton",
breaks = c("BE", "VD", "ZH"),
values = c(unibeRedS()[1], unibeMustardS()[1], unibeIceS()[1]),
labels = c("Bern", "Vaud", "Zurich")) +
scale_x_date(breaks = as.Date(c("2020-02-24", "2020-04-01", "2020-05-01",
"2020-06-01","2020-07-01")),
labels = c("24 February", "1 April", "1 May", "1 June", "1 July"),
limits = as.Date(c("2020-02-23", "2020-07-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100),
limits = c(0, 600)) +
ggtitle(label = "Confirmed covid cases in 3 cantons") +
xlab(label = "Time") +
ylab(label = "# of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
theme(panel.spacing = unit(2, "lines")) +
facet_grid(cols = vars(geoRegion))
Create facet grids by country from the three plots you created in Exercise 4H.
# create point plot
plot_ebola_point_facet <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country,fill = country)) +
geom_point(alpha = 0.7,
shape = 22, size = 1.5, stroke = 1.5) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01",
"2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
facet_grid(cols = vars(country))
# create line plot
plot_ebola_line_facet <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, colour = country)) +
geom_line(mapping = aes(group = country),
alpha = 0.7, linetype = "dashed", linewidth = 1.5) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01", "2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 10000, by = 2500),
limits = c(0, 10000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
facet_grid(cols = vars(country))
# create column plot
plot_ebola_col_facet <- ggplot(data = data_ebola_cum_cases,
mapping = aes(x = date, y = cum_conf_cases, fill = country, colour = country)) +
geom_col(alpha = 0.7, linetype = "solid",
linewidth = 0.1, position = "stack", width = 0.7) +
scale_fill_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_colour_manual(name = "Country",
breaks = c("Guinea", "Liberia", "Sierra Leone"),
values = c(unibeRedS()[1], unibeOceanS()[1], unibeMustardS()[1]),
labels = c("GIN", "LBR", "SLE")) +
scale_x_date(breaks = as.Date(c("2014-08-29", "2014-10-01", "2014-12-01",
"2015-02-01", "2015-04-01")),
labels = c("29 August", "1 October", "1 December", "1 February", "1 April"),
limits = as.Date(c("2014-08-28", "2015-04-01"))) +
scale_y_continuous(breaks = seq(from = 0, to = 15000, by = 2500),
limits = c(0, 15000)) +
ggtitle(label = "Confirmed Ebola cases") +
xlab(label = "Time") +
ylab(label = "Cum. # of confirmed cases") +
theme_bw() + theme(legend.position="bottom") +
facet_grid(cols = vars(country))
Artwork by @allison_horst
Install cowplot:
Arrange six of the plots you created in the previous exercises into a grid.
insurance <- read.csv("data/raw/insurance_with_date.csv")
insurance <- insurance %>% mutate(children = as.factor(children))
head(insurance)
X age sex bmi children smoker region charges date
1 1 59 male 31.790 2 no southeast 13086.341 2001-01-15
2 2 24 female 22.600 0 no southwest 2574.268 2001-01-17
3 3 28 female 25.935 1 no northwest 4411.400 2001-01-22
4 4 22 male 25.175 0 no northwest 2321.417 2001-01-29
5 5 60 female 36.005 0 no northeast 13434.551 2001-02-06
6 6 38 female 28.000 3 no southwest 7262.940 2001-02-17
[1] 1338 9
Data adapted from “Machine Learning with R” by Brett Lantz.
Exercise 5A: Can you reproduce these graphs using the insurance.csv dataset?
ggplot( insurance , aes(x = bmi, colour = sex, fill = sex ) ) +
geom_density( alpha = 0.4 ) +
theme(text = element_text(size=20), legend.position = "bottom") +
xlab( expression(paste( "BMI (kg/", m^2,")")) ) +
scale_colour_manual(name = "" , values=c("female"=unibePastelS()[1],
"male"=unibeIceS()[1]), labels = c("Female", "Male")) +
scale_fill_manual(name = "", values=c("female"=unibePastelS()[1],
"male"=unibeIceS()[1]), labels = c("Female", "Male"))
ggplot( insurance ) +
geom_histogram( aes(x = charges, y = after_stat(density), colour = sex, fill = sex ),
alpha = 0.4, bins = 100 ) +
geom_density( aes(x = charges, colour = sex), linewidth = 1.5 ) +
theme(text = element_text(size=20), legend.position = "top") +
xlab( "Charges in Dollar" ) +
scale_colour_manual(name = "" , values=c("female"=unibePastelS()[1],
"male"=unibeIceS()[1]), labels = c("Female", "Male")) +
scale_fill_manual(name = "", values=c("female"=unibePastelS()[1],
"male"=unibeIceS()[1]), labels = c("Female", "Male")) +
geom_vline(aes(xintercept = median(charges)), color = unibeRedS()[1], linewidth = 1)
Excersize 5B: Can you reproduce this graph using the insurance.csv dataset?
ggplot( insurance , aes(x = age, y = bmi, color =smoker) ) +
geom_point( ) +
geom_quantile( ) +
theme(text = element_text(size=20), legend.position = "top") +
xlab( "Age (years)" ) + ylab( expression(paste( "BMI (kg/", m^2,")")) ) +
scale_colour_manual(name = "" , values=c("no"=unibeRedS()[1],
"yes"=unibeIceS()[1]), labels = c("No", "Yes")) +
scale_fill_manual(name = "" , values=c("no"=unibeRedS()[1],
"yes"=unibeIceS()[1]), labels = c("No", "Yes"))
Excersize 5C: Can you reproduce these graphs using the insurance.csv dataset?
Community driven projects for practicing
Images by @tanyashapiro, @gkaramanis, @cscherer
Public Health Sciences Course Program - Basic Statistics and Projects in R. Slides available on GitHub.