STA 326 2.0 Programming and Data Analysis with R

class: center, middle, inverse, title-slide

# STA 326 2.0 Programming and Data Analysis with R
## ✍️ Data Manipulation with dplyr
### 
### Dr Thiyanga Talagala

---

.pull-left[

## Today's menu

- `filter`

- `select`

- `mutate`

- `summarise`

- `arrange`

- `group_by`

- `rename`

]

.pull-right[
<center><img src="reshape_cake.jpeg" height="600px"/></center>
]

---

# packages

```r
library(tidyverse) # TO obtain dplyr
library(magrittr)
```

---

# Dataset

```r
library(gapminder)
str(gapminder)
```

```
tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
 $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
 $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
 $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
 $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
```

```r
head(gapminder)
```

```
# A tibble: 6 x 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.
```

---
# Dataset (cont.)

```r
glimpse(gapminder)
```

```
Rows: 1,704
Columns: 6
$ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
```

---
# Dataset (cont.)

```r
tbl_df(gapminder)
```

```
# A tibble: 1,704 x 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows
```

---

# Dataset (cont.)

```r
library(skimr)
skim(gapminder)
```

---
background-image: url(dplyr.png)
background-size: 100px
background-position: 98% 6%

# `dplyr` verbs

.pull-left[
- `filter`

- `select`

- `mutate`

- `summarise`
]

.pull-right[

- `arrange`

- `group_by`

- `rename`

]

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `filter`: Picks observations by their values.

- Takes logical expressions and returns the rows for which all are `TRUE`.

```r
filter(gapminder, lifeExp < 50)
```

```
# A tibble: 491 x 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 481 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `filter` (cont)

```r
# gapminder %>% filter(country == "Sri Lanka")
filter(gapminder, country == "Sri Lanka")
```

```
# A tibble: 12 x 6
   country   continent  year lifeExp      pop gdpPercap
   <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
 1 Sri Lanka Asia       1952    57.6  7982342     1084.
 2 Sri Lanka Asia       1957    61.5  9128546     1073.
 3 Sri Lanka Asia       1962    62.2 10421936     1074.
 4 Sri Lanka Asia       1967    64.3 11737396     1136.
 5 Sri Lanka Asia       1972    65.0 13016733     1213.
 6 Sri Lanka Asia       1977    65.9 14116836     1349.
 7 Sri Lanka Asia       1982    68.8 15410151     1648.
 8 Sri Lanka Asia       1987    69.0 16495304     1877.
 9 Sri Lanka Asia       1992    70.4 17587060     2154.
10 Sri Lanka Asia       1997    70.5 18698655     2664.
11 Sri Lanka Asia       2002    70.8 19576783     3015.
12 Sri Lanka Asia       2007    72.4 20378239     3970.
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `filter` (cont)

```r
filter(gapminder, country %in% c("Sri Lanka", "Australia"))
```

```
# A tibble: 24 x 6
   country   continent  year lifeExp      pop gdpPercap
   <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
 1 Australia Oceania    1952    69.1  8691212    10040.
 2 Australia Oceania    1957    70.3  9712569    10950.
 3 Australia Oceania    1962    70.9 10794968    12217.
 4 Australia Oceania    1967    71.1 11872264    14526.
 5 Australia Oceania    1972    71.9 13177000    16789.
 6 Australia Oceania    1977    73.5 14074100    18334.
 7 Australia Oceania    1982    74.7 15184200    19477.
 8 Australia Oceania    1987    76.3 16257249    21889.
 9 Australia Oceania    1992    77.6 17481977    23425.
10 Australia Oceania    1997    78.8 18565243    26998.
# … with 14 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `filter` (cont)

```r
filter(gapminder, country %in% c("Sri Lanka", "Australia")) %>%
* head()
```

```
# A tibble: 6 x 6
  country   continent  year lifeExp      pop gdpPercap
  <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
1 Australia Oceania    1952    69.1  8691212    10040.
2 Australia Oceania    1957    70.3  9712569    10950.
3 Australia Oceania    1962    70.9 10794968    12217.
4 Australia Oceania    1967    71.1 11872264    14526.
5 Australia Oceania    1972    71.9 13177000    16789.
6 Australia Oceania    1977    73.5 14074100    18334.
```

---
```r
filter(gapminder, country %in% c("Sri Lanka", "Australia")) %>%
* tail()
```

```
# A tibble: 6 x 6
  country   continent  year lifeExp      pop gdpPercap
  <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
1 Sri Lanka Asia       1982    68.8 15410151     1648.
2 Sri Lanka Asia       1987    69.0 16495304     1877.
3 Sri Lanka Asia       1992    70.4 17587060     2154.
4 Sri Lanka Asia       1997    70.5 18698655     2664.
5 Sri Lanka Asia       2002    70.8 19576783     3015.
6 Sri Lanka Asia       2007    72.4 20378239     3970.
```
---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `select`: Picks variables by their names.

```r
head(gapminder, 3)
```

```
# A tibble: 3 x 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
```

```r
select(gapminder, year:gdpPercap)
```

```
# A tibble: 1,704 x 4
    year lifeExp      pop gdpPercap
   <int>   <dbl>    <int>     <dbl>
 1  1952    28.8  8425333      779.
 2  1957    30.3  9240934      821.
 3  1962    32.0 10267083      853.
 4  1967    34.0 11537966      836.
 5  1972    36.1 13079460      740.
 6  1977    38.4 14880372      786.
 7  1982    39.9 12881816      978.
 8  1987    40.8 13867957      852.
 9  1992    41.7 16317921      649.
10  1997    41.8 22227415      635.
# … with 1,694 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `select` (cont.)

```r
head(gapminder, 3)
```

```r
select(gapminder, year, gdpPercap)
```

```
# A tibble: 1,704 x 2
    year gdpPercap
   <int>     <dbl>
 1  1952      779.
 2  1957      821.
 3  1962      853.
 4  1967      836.
 5  1972      740.
 6  1977      786.
 7  1982      978.
 8  1987      852.
 9  1992      649.
10  1997      635.
# … with 1,694 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `select` (cont.)

```r
head(gapminder, 3)
```

```r
select(gapminder, -c(year, gdpPercap))
```

```
# A tibble: 1,704 x 4
   country     continent lifeExp      pop
   <fct>       <fct>       <dbl>    <int>
 1 Afghanistan Asia         28.8  8425333
 2 Afghanistan Asia         30.3  9240934
 3 Afghanistan Asia         32.0 10267083
 4 Afghanistan Asia         34.0 11537966
 5 Afghanistan Asia         36.1 13079460
 6 Afghanistan Asia         38.4 14880372
 7 Afghanistan Asia         39.9 12881816
 8 Afghanistan Asia         40.8 13867957
 9 Afghanistan Asia         41.7 16317921
10 Afghanistan Asia         41.8 22227415
# … with 1,694 more rows
```

---

background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `select` (cont.)

```r
head(gapminder, 3)
```

```r
select(gapminder, -(year:gdpPercap))
```

```
# A tibble: 1,704 x 2
   country     continent
   <fct>       <fct>    
 1 Afghanistan Asia     
 2 Afghanistan Asia     
 3 Afghanistan Asia     
 4 Afghanistan Asia     
 5 Afghanistan Asia     
 6 Afghanistan Asia     
 7 Afghanistan Asia     
 8 Afghanistan Asia     
 9 Afghanistan Asia     
10 Afghanistan Asia     
# … with 1,694 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `mutate`

- Creates new variables with functions of existing variables

```r
gapminder %>% mutate(gdp = pop * gdpPercap)
```

```
# A tibble: 1,704 x 7
   country     continent  year lifeExp      pop gdpPercap          gdp
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>        <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.  6567086330.
 2 Afghanistan Asia       1957    30.3  9240934      821.  7585448670.
 3 Afghanistan Asia       1962    32.0 10267083      853.  8758855797.
 4 Afghanistan Asia       1967    34.0 11537966      836.  9648014150.
 5 Afghanistan Asia       1972    36.1 13079460      740.  9678553274.
 6 Afghanistan Asia       1977    38.4 14880372      786. 11697659231.
 7 Afghanistan Asia       1982    39.9 12881816      978. 12598563401.
 8 Afghanistan Asia       1987    40.8 13867957      852. 11820990309.
 9 Afghanistan Asia       1992    41.7 16317921      649. 10595901589.
10 Afghanistan Asia       1997    41.8 22227415      635. 14121995875.
# … with 1,694 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `summarise`(British) or `summarize` (US)

- Collapse many values down to a single summary

```r
gapminder %>%
  summarise(
    lifeExp_mean=mean(lifeExp),
    pop_mean=mean(pop),
    gdpPercap_mean=mean(gdpPercap))
```

```
# A tibble: 1 x 3
  lifeExp_mean  pop_mean gdpPercap_mean
         <dbl>     <dbl>          <dbl>
1         59.5 29601212.          7215.
```

---

background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `arrange`

- Reorder the rows

```r
arrange(gapminder, desc(lifeExp))
```

```
# A tibble: 1,704 x 6
   country          continent  year lifeExp       pop gdpPercap
   <fct>            <fct>     <int>   <dbl>     <int>     <dbl>
 1 Japan            Asia       2007    82.6 127467972    31656.
 2 Hong Kong, China Asia       2007    82.2   6980412    39725.
 3 Japan            Asia       2002    82   127065841    28605.
 4 Iceland          Europe     2007    81.8    301931    36181.
 5 Switzerland      Europe     2007    81.7   7554661    37506.
 6 Hong Kong, China Asia       2002    81.5   6762476    30209.
 7 Australia        Oceania    2007    81.2  20434176    34435.
 8 Spain            Europe     2007    80.9  40448191    28821.
 9 Sweden           Europe     2007    80.9   9031088    33860.
10 Israel           Asia       2007    80.7   6426679    25523.
# … with 1,694 more rows
```

---

background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `group_by`

- Takes an existing tibble and converts it into a grouped tibble where operations are performed "by group". ungroup() removes grouping.

```r
Japan_SL <- filter(gapminder, country %in% c("Japan", "Sri Lanka"))
Japan_SL %>% head()
```

```
# A tibble: 6 x 6
  country continent  year lifeExp       pop gdpPercap
  <fct>   <fct>     <int>   <dbl>     <int>     <dbl>
1 Japan   Asia       1952    63.0  86459025     3217.
2 Japan   Asia       1957    65.5  91563009     4318.
3 Japan   Asia       1962    68.7  95831757     6577.
4 Japan   Asia       1967    71.4 100825279     9848.
5 Japan   Asia       1972    73.4 107188273    14779.
6 Japan   Asia       1977    75.4 113872473    16610.
```

---

# `group_by`

```r
Japan_SL_grouped <- Japan_SL %>% group_by(country)
Japan_SL_grouped
```

```
# A tibble: 24 x 6
# Groups:   country [2]
   country continent  year lifeExp       pop gdpPercap
   <fct>   <fct>     <int>   <dbl>     <int>     <dbl>
 1 Japan   Asia       1952    63.0  86459025     3217.
 2 Japan   Asia       1957    65.5  91563009     4318.
 3 Japan   Asia       1962    68.7  95831757     6577.
 4 Japan   Asia       1967    71.4 100825279     9848.
 5 Japan   Asia       1972    73.4 107188273    14779.
 6 Japan   Asia       1977    75.4 113872473    16610.
 7 Japan   Asia       1982    77.1 118454974    19384.
 8 Japan   Asia       1987    78.7 122091325    22376.
 9 Japan   Asia       1992    79.4 124329269    26825.
10 Japan   Asia       1997    80.7 125956499    28817.
# … with 14 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `group_by` (cont.)

```r
Japan_SL %>% summarise(mean_lifeExp=mean(lifeExp))
```

```
# A tibble: 1 x 1
  mean_lifeExp
         <dbl>
1         70.7
```

```r
Japan_SL_grouped %>% summarise(mean_lifeExp=mean(lifeExp))
```

```
# A tibble: 2 x 2
  country   mean_lifeExp
  <fct>            <dbl>
1 Japan             74.8
2 Sri Lanka         66.5
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# `rename`

- Rename variables

```r
head(gapminder, 3)
```

```r
rename(gapminder, `life expectancy`=lifeExp,
       population=pop) # new_name = old_name
```

```
# A tibble: 1,704 x 6
   country     continent  year `life expectancy` population gdpPercap
   <fct>       <fct>     <int>             <dbl>      <int>     <dbl>
 1 Afghanistan Asia       1952              28.8    8425333      779.
 2 Afghanistan Asia       1957              30.3    9240934      821.
 3 Afghanistan Asia       1962              32.0   10267083      853.
 4 Afghanistan Asia       1967              34.0   11537966      836.
 5 Afghanistan Asia       1972              36.1   13079460      740.
 6 Afghanistan Asia       1977              38.4   14880372      786.
 7 Afghanistan Asia       1982              39.9   12881816      978.
 8 Afghanistan Asia       1987              40.8   13867957      852.
 9 Afghanistan Asia       1992              41.7   16317921      649.
10 Afghanistan Asia       1997              41.8   22227415      635.
# … with 1,694 more rows
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# Combine multiple operations

```r
gapminder %>%
filter(country == 'China') %>% head(2)
```

```
# A tibble: 2 x 6
  country continent  year lifeExp       pop gdpPercap
  <fct>   <fct>     <int>   <dbl>     <int>     <dbl>
1 China   Asia       1952    44   556263527      400.
2 China   Asia       1957    50.5 637408000      576.
```

```r
gapminder %>%
filter(country == 'China') %>% summarise(lifemax=max(lifeExp))
```

```
# A tibble: 1 x 1
  lifemax
    <dbl>
1    73.0
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# Combine multiple operations (cont.)

```r
gapminder %>%
filter(country == 'China') %>%
*filter(lifeExp == max(lifeExp))
```

```
# A tibble: 1 x 6
  country continent  year lifeExp        pop gdpPercap
  <fct>   <fct>     <int>   <dbl>      <int>     <dbl>
1 China   Asia       2007    73.0 1318683096     4959.
```

---
background-image: url(dplyr.png)
background-size: 70px
background-position: 98% 6%

# Combine multiple operations

```r
gapminder %>%
filter(continent == 'Asia') %>%
group_by(country) %>%
filter(lifeExp == max(lifeExp)) %>%
arrange(desc(year))
```

```
# A tibble: 33 x 6
# Groups:   country [33]
   country          continent  year lifeExp        pop gdpPercap
   <fct>            <fct>     <int>   <dbl>      <int>     <dbl>
 1 Afghanistan      Asia       2007    43.8   31889923      975.
 2 Bahrain          Asia       2007    75.6     708573    29796.
 3 Bangladesh       Asia       2007    64.1  150448339     1391.
 4 Cambodia         Asia       2007    59.7   14131858     1714.
 5 China            Asia       2007    73.0 1318683096     4959.
 6 Hong Kong, China Asia       2007    82.2    6980412    39725.
 7 India            Asia       2007    64.7 1110396331     2452.
 8 Indonesia        Asia       2007    70.6  223547000     3541.
 9 Iran             Asia       2007    71.0   69453570    11606.
10 Israel           Asia       2007    80.7    6426679    25523.
# … with 23 more rows
```

---

class: duke-orange, center, middle

# Combine Data Sets

---

# Combine Data Sets

.pull-left[
### Mutating joins

- `left_join`

- `right_join`

- `inner_join`

- `full_join`

]

.pull-right[

### Set operations

- `intersect`

- `union`

### Binding

- `bind_rows`

- `bind_cols`

]

---
# `left_join`

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("A", "B", "D"), x3=c("red", "yellow" , "green"))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

---

# `left_join`

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

.pull-right[

```r
left_join(first, second, by="x1")
```

```
# A tibble: 3 x 3
  x1       x2 x3    
  <chr> <dbl> <chr> 
1 A         1 red   
2 B         2 yellow
3 C         3 <NA>  
```
]

---
# `right_join`

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("A", "B", "D"), x3=c("red", "yellow" , "green"))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

---
# `right_join`

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

.pull-right[

```r
right_join(first, second, by="x1")
```

```
# A tibble: 3 x 3
  x1       x2 x3    
  <chr> <dbl> <chr> 
1 A         1 red   
2 B         2 yellow
3 D        NA green 
```
]
---
# `inner_join`

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("A", "B", "D"), x3=c("red", "yellow" , "green"))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```
]

---
# `inner_join`

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

.pull-right[

```r
inner_join(first, second, by="x1")
```

```
# A tibble: 2 x 3
  x1       x2 x3    
  <chr> <dbl> <chr> 
1 A         1 red   
2 B         2 yellow
```
]
---

# `full_join`

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("A", "B", "D"), x3=c("red", "yellow" , "green"))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```
]

---

# `full_join`

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1    x3    
  <chr> <chr> 
1 A     red   
2 B     yellow
3 D     green 
```

]

.pull-right[

```r
full_join(first, second, by="x1")
```

```
# A tibble: 4 x 3
  x1       x2 x3    
  <chr> <dbl> <chr> 
1 A         1 red   
2 B         2 yellow
3 C         3 <NA>  
4 D        NA green 
```
]
---
# Set operations

Two compatible data sets. Column names are the same.

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("D", "B", "C"), x2=c(10, 2, 3))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B         2
3 C         3
```
]

---

# Set operations

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B         2
3 C         3
```

]

.pull-right[
**intersect**

```r
intersect(first, second)
```

```
# A tibble: 2 x 2
  x1       x2
  <chr> <dbl>
1 B         2
2 C         3
```

]
---

# Set operations

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B         2
3 C         3
```

]

.pull-right[

**union**

```r
union(first, second)
```

```
# A tibble: 4 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
4 D        10
```
]
---

# Set operations (cont.)

Two compatible data sets. Column names are the same.

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
*second <- tibble(x1=c("D", "B", "C"), x2=c(10, 20, 30))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```
]

---
# Set operations (cont.)

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[
**intersect**

```r
intersect(first, second)
```

```
# A tibble: 0 x 2
# … with 2 variables: x1 <chr>, x2 <dbl>
```

]

---

# Set operations (cont.)

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[

**union**

```r
union(first, second)
```

```
# A tibble: 6 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
4 D        10
5 B        20
6 C        30
```
]

---

# Set operations (cont.)

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
*second <- tibble(x1=c("D", "B", "C"), x2=c(10, 20, 30))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

---

# Set operations (cont.)

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[

**union**

```r
union(first, second)
```

```
# A tibble: 6 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
4 D        10
5 B        20
6 C        30
```

]

---

# Set operations (cont.)

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[

**intersect**

```r
intersect(first, second)
```

```
# A tibble: 0 x 2
# … with 2 variables: x1 <chr>, x2 <dbl>
```

]

---

# Binding

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("D", "B", "C"), x2=c(10, 20, 30))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```
]

---

# Binding

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[
**bind_rows**

```r
bind_rows(first, second)
```

```
# A tibble: 6 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
4 D        10
5 B        20
6 C        30
```

]

---
# Binding (cont.)

```r
first <- tibble(x1=c("A", "B", "C"), x2=c(1, 2, 3))
second <- tibble(x1=c("D", "B", "C"), x2=c(10, 20, 30))
```

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

]

.pull-right[

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```
]

---
# Binding (cont.)

.pull-left[

```r
first
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 A         1
2 B         2
3 C         3
```

```r
second
```

```
# A tibble: 3 x 2
  x1       x2
  <chr> <dbl>
1 D        10
2 B        20
3 C        30
```

]

.pull-right[
**bind_cols**

```r
bind_cols(first, second)
```

```
# A tibble: 3 x 4
  x1...1 x2...2 x1...3 x2...4
  <chr>   <dbl> <chr>   <dbl>
1 A           1 D          10
2 B           2 B          20
3 C           3 C          30
```
]

---

background-image: url('dplyrcs1.png')
background-position: center
background-size: contain

---

background-image: url('dplyrcs2.png')
background-position: center
background-size: contain

---
class: center, middle

Slides available at: hellor.netlify.app