+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization

✍️ The Grammar of Graphics

Dr Thiyanga Talagala

1

Today's menu

  • The Grammar of Graphics

description of the image

Acknowledgement: Justin Matejke and George Fitzmaurice, Autodesk Research, Canada

2

Grammar of Graphics

knitrhex

knitrhex

3

Packages

library(tidyverse) # To obtain ggplot2
library(magrittr)

knitrhex rmarkdown

4
5
6

Dataset

library(gapminder)
glimpse(gapminder)
Rows: 1,704
Columns: 6
$ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
7

Plotting with R

8

Base R

  • using plot() function

Using ggplot2: grammar of graphics

  1. ggplot2 package: qplot() function

    • qplot: quick plot

    • very similar to how you graph with plot() function

  2. ggplot2 package: ggplot() function

    • fully utilize the power of grammar
9

Grammar

English

  • Nouns

  • Article

  • Adjective

  • Verb

  • Adverb

  • Proposition

Graphics

knitrhex

10

English

The little monkey hangs confidently by a branch.

  • Article: The

  • Adjective: little

  • Noun: monkey

  • Verb: hangs

  • Adverb: Confidently

  • Proposition: by

  • Noun: a branch

Graphics

ggplot(iris)+
aes(x = Sepal.Length,
y = Sepal.Width)+
geom_point()

11

Elements of ggplot2 object

  • Data

  • Aesthetics: x, y, col

  • Geometrics: geom_point, geom_boxplot

12

Elements of ggplot2 object

knitrhex

  • Data: data

  • Aesthetics: aes

  • Geometrics: geom_*

13
14

Making your first plot with ggplot

15

Data: data to be plotted

knitrhex

'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
16

Data

ggplot(iris)

17

Aesthetics: mapping variables

knitrhex

  • x

  • y

  • colour

  • shape

18

Data + Aesthetics

ggplot(iris,
aes(x =S epal.Length,
y = Sepal.Width))

19

Geometrics

knitrhex

  • geom_point

  • geom_boxplot

20

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point()

21

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point()

knitrhex

22

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point(col = "forestgreen")

23

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point(col = "forestgreen",
shape = 8)

24

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()

25

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col= Species))+
geom_point(
shape = 3
)

26

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()

knitrhex

27

Facets: small multiples

knitrhex

28

Data + Aesthetics + Geometrics + Facets

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_grid(~Species)

29

Data + Aesthetics + Geometrics + Facets

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_grid(Species ~.)

30

Statistics

knitrhex

31

Data + Aesthetics + Geometrics + Facets + Statistics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_wrap(~Species)+
stat_smooth(method = "lm", se = F, col ="red")

32

Data + Aesthetics + Geometrics + Facets + Statistics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_wrap( ~ Species)+
stat_smooth(method = "lm", se = T, col = "red")

33

Coordinate

knitrhex

34

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal()

35

Theme

knitrhex

36

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate+ Theme

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col ="red") +
coord_equal() +
theme(legend.position = "bottom")

37

Scale

knitrhex

38

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate + Theme + Scale

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal() +
theme(legend.position = "bottom") +
scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"))

39

titles and axes labels

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal() +
theme(legend.position = "bottom") +
scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"))+
labs(title="Scatter plot of Sepal Length vs Sepal Width",
x ="Sepal Length (cm)", y = "Sepal Width (cm)")

40

Your turn

Dataset: gapminder

Visualize the relationship between life expectancy, GDP per capita and continent in 2007.

41
gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() + theme(legend.position = "bottom") +
labs(title = "Relationship between life expectancy and GPD per capita by continent - 2007",
x ="life expectancy at birth, in years",
y = "GDP per capita (US$, inflation-adjusted)")

42

Add a vertical line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_vline(xintercept = 70)

43

Add a horizontal line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_hline(yintercept = 20000)

44

Add a diagonal line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_abline(intercept = 20, slope=200)

45

All Geoms

[1] "geom_abline" "geom_area" "geom_bar"
[4] "geom_bin_2d" "geom_bin2d" "geom_blank"
[7] "geom_boxplot" "geom_col" "geom_contour"
[10] "geom_contour_filled" "geom_count" "geom_crossbar"
[13] "geom_curve" "geom_density" "geom_density_2d"
[16] "geom_density_2d_filled" "geom_density2d" "geom_density2d_filled"
[19] "geom_dotplot" "geom_errorbar" "geom_errorbarh"
[22] "geom_freqpoly" "geom_function" "geom_hex"
[25] "geom_histogram" "geom_hline" "geom_jitter"
[28] "geom_label" "geom_line" "geom_linerange"
[31] "geom_map" "geom_path" "geom_point"
[34] "geom_pointrange" "geom_polygon" "geom_qq"
[37] "geom_qq_line" "geom_quantile" "geom_raster"
[40] "geom_rect" "geom_ribbon" "geom_rug"
[43] "geom_segment" "geom_sf" "geom_sf_label"
[46] "geom_sf_text" "geom_smooth" "geom_spoke"
[49] "geom_step" "geom_text" "geom_tile"
[52] "geom_violin" "geom_vline"
46

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot()

47

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, color=continent)) +
geom_boxplot()

48

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot()

49

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot(fill="forestgreen")

50

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot(fill="forestgreen", alpha=0.5)

51

geom_point

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_point()

52

geom_jitter

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter()

53

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter() +
geom_boxplot()

54

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter() +
geom_boxplot(alpha=0.5)

55

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot() +
geom_jitter()

56

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot() +
geom_jitter()

57

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot() +
geom_jitter(aes(col=continent))

58

geom_jitter + geom_boxplot (outlier.shape = NA)

ggplot(gapminder2007, aes(x = lifeExp, y = continent, fill = continent)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(col = continent))

59

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(col=continent))

60

geom_jitter + geom_boxplot

ggplot(gapminder2007,
aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA, alpha=0.2) +
geom_jitter(aes(col=continent))

61

geom_jitter + geom_boxplot + coord_flip

ggplot(gapminder2007,
aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA, alpha=0.2) +
geom_jitter(aes(col=continent)) +
coord_flip()

62

geom_boxplot

ggplot(gapminder2007, aes(y=lifeExp))+
geom_boxplot()

63

geom_boxplot + facet_wrap

ggplot(gapminder2007,
aes(y = lifeExp))+
geom_boxplot() + facet_wrap(~continent, ncol = 5)

64

geom_density

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_density() +
facet_wrap(~continent, ncol=5)

65

Your turn

Modify the code below to obtain the following plot.

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_density()

66

geom_histogram

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_histogram()

67

geom_bar

ggplot(gapminder2007,
aes(x=continent))+
geom_bar()

68

Your turn

Modify the code below to obtain the following plot.

ggplot(gapminder2007,
aes(x=continent))+
geom_bar()

69

geom_bar (stat="identity")

Method 1

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent
cut percent
1 Fair 3.0
2 Good 9.0
3 Very Good 22.4
4 Premium 25.6
5 Ideal 40.0
70
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_bar(stat="identity")

71

geom_col

Method 2

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent
cut percent
1 Fair 3.0
2 Good 9.0
3 Very Good 22.4
4 Premium 25.6
5 Ideal 40.0
72
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_col()

73

Change the order of levels

Method 2

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent$cut <- factor(cut.percent$cut,
levels = c("Fair", "Good", "Very Good",
"Premium", "Ideal"))
74
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_col()

75

geom_point

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_point()

76

geom_line

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_line()

77

geom_line + geom_point

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_line() +
geom_point()

78

Your turn

Modify the code below to obtain the following plot.

gapminder %>% filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) + geom_line() + geom_point()

79

Data Wrangling + Data Visualization

avglifeExp <- gapminder %>%
group_by(continent, year) %>%
summarise(meanlifeExp=mean(lifeExp))
avglifeExp
# A tibble: 60 × 3
# Groups: continent [5]
continent year meanlifeExp
<fct> <int> <dbl>
1 Africa 1952 39.1
2 Africa 1957 41.3
3 Africa 1962 43.3
4 Africa 1967 45.3
5 Africa 1972 47.5
6 Africa 1977 49.6
7 Africa 1982 51.6
8 Africa 1987 53.3
9 Africa 1992 53.6
10 Africa 1997 53.6
# … with 50 more rows
80

Your turn

Write an R code to reproduce the plot below.

Hint: use avglifeExp

81

Your turn

Write an R code to reproduce the plot below.

82

Your turn

Write an R code to reproduce the plot below.

Hint: Next slide

83
gapminder %>%
ggplot(aes(y=log(lifeExp), x=log(gdpPercap), color=continent)) +
geom_point() +
labs(y = "log(Life Expectancy)",
x = "log(GDP per capita)")

84

Your turn

Write an R code to reproduce the plot below.

85

geom_point

ggplot(gapminder, aes(x=year, y=gdpPercap, colour=continent))+geom_point()

86

geom_smooth

ggplot(gapminder, aes(x=year, y=gdpPercap, colour=continent))+
geom_smooth()

87

Your turn

Write an R code to reproduce the plot below.

88

Your turn

Write an R code to reproduce the plot below.

89

Your turn

Write an R code to visualize the shape of standard normal distribution.

Hint: dnorm

90

Recap

aes

  • x
  • y
  • colour
  • size

geom arguments

  • colour
  • fill
  • size
  • alpha
  • shape
91

Recap

geom

  • geom_point
  • geom_jitter
  • geom_line
  • geom_bar
  • geom_col
  • geom_histogram
  • geom_smooth
  • geom_density
  • geom_abline
  • geom_vline
  • geom_hline

other elements

  • labs
  • coord_equal
  • coord_flip
  • scale_colour_manual
  • labs
  • facet_wrap
  • theme
92

Slides available at: hellor.netlify.app

All rights reserved by Thiyanga S. Talagala

93

Today's menu

  • The Grammar of Graphics

description of the image

Acknowledgement: Justin Matejke and George Fitzmaurice, Autodesk Research, Canada

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow