class: center, middle, inverse, title-slide # STA 326 2.0 Programming and Data Analysis with R ## ✍️ Reshaping Data ### ### Dr Thiyanga Talagala --- <style type="text/css"> .remark-slide-content { font-size: 35px; } </style> <style type="text/css"> h1, #TOC>ul>li { color: #006837; font-weight: bold; } h2, #TOC>ul>ul>li { color: #006837; #font-family: "Times"; font-weight: bold; } h3, #TOC>ul>ul>li { color: #ce1256; #font-family: "Times"; font-weight: bold; } </style> # Today's menu .pull-left[ **Reshaping data** - pivot_wider - pivot_longer - seperate - unite ] .pull-right[ <center><img src="recake.png" height="600px"/></center> ] --- ## Data Wrangling/ Data Munging <img src="tidyr/tidyr0.png" width="80%" /> **Reshaping Data (tidying your data)** How to reshape your data in order to make the analysis easier? --- ## Tidy Data <img src="tidy-1.png" width="80%" /> - Each **variable** is placed in its column - Each **observation** is placed in its own row - Each **value** is placed in its own cell --- # packages ```r library(tidyverse) #or library(tidyr) library(magrittr) ``` ![](tidyr.jpeg) ![](magrittrlogo.png) --- # `tidyr` package .pull-left[ <img src="tidyr/tidyrhadley.JPG" alt="knitrhex" width="550"/> ] .pull-right[ Hadley Wickham, Chief Scientist at RStudio explaining tidyr at WOMBAT organized by Monash University, Australia. Image taken by [Thiyanga S Talagala](https://thiyanga.netlify.app/) at WOMBAT Melbourne, Australia, December-2019 ] --- ![](tidyr/tidyr1.png) --- ![](tidyr/tidyr2.png) --- ![](tidyr/tidyr3.png) --- ![](tidyr/tidyr4.png) --- ![](tidyr/tidyr5.png) --- background-image: url(tidyr.jpeg) background-size: 100px background-position: 98% 6% # tidyr verbs .pull-left[ ### Main verbs - `pivot_longer` (gather) - `pivot_wider` (spread) ] .pull-right[ ### Other - `separate` - `unite` ] #### Input and Output Main input: `data frame` or `tibble`. Output: `tibble` --- class: duke-orange, center, middle # `pivot_longer` --- ## `pivot_longer()` - Turns columns into rows. - From wide format to long format. <img src="tidyr/pivot_longer.png" width="72%" /> --- ## `pivot_longer()` .pull-left[ ```r dengue <- tibble( dist = c("Colombo", "Gampaha", "Kalutara"), '2017' = c(20718, 10258, 34274), '2018' = c(16573, 5857, 31647), '2019' = c(8395, 3155, 10961)); dengue ``` ``` # A tibble: 3 x 4 dist `2017` `2018` `2019` <chr> <dbl> <dbl> <dbl> 1 Colombo 20718 16573 8395 2 Gampaha 10258 5857 3155 3 Kalutara 34274 31647 10961 ``` ] .pull-right[ ```r dengue %>% pivot_longer(2:4, names_to="Year", values_to = "Dengue counts") ``` ``` # A tibble: 9 x 3 dist Year `Dengue counts` <chr> <chr> <dbl> 1 Colombo 2017 20718 2 Colombo 2018 16573 3 Colombo 2019 8395 4 Gampaha 2017 10258 5 Gampaha 2018 5857 6 Gampaha 2019 3155 7 Kalutara 2017 34274 8 Kalutara 2018 31647 9 Kalutara 2019 10961 ``` ] --- class: duke-orange, center, middle # `pivot_wider` --- # `pivot_wider()` - From long to wide format. ![](tidyr/pivot_wider.png) --- # `pivot_wider()` ```r Corona <- tibble( country = rep(c("USA", "Brazil", "Russia"), each=2), status = rep(c("Death", "Recovered"), 3), count = c(99381, 451745, 22746, 149911, 3633, 118798)) ``` ```r Corona ``` ``` # A tibble: 6 x 3 country status count <chr> <chr> <dbl> 1 USA Death 99381 2 USA Recovered 451745 3 Brazil Death 22746 4 Brazil Recovered 149911 5 Russia Death 3633 6 Russia Recovered 118798 ``` --- # `pivot_wider()` .pull-left[ ```r Corona ``` ``` # A tibble: 6 x 3 country status count <chr> <chr> <dbl> 1 USA Death 99381 2 USA Recovered 451745 3 Brazil Death 22746 4 Brazil Recovered 149911 5 Russia Death 3633 6 Russia Recovered 118798 ``` ] .pull-right[ ```r Corona %>% pivot_wider(names_from=status, values_from=count) ``` ``` # A tibble: 3 x 3 country Death Recovered <chr> <dbl> <dbl> 1 USA 99381 451745 2 Brazil 22746 149911 3 Russia 3633 118798 ``` ] --- # Assign a name: ```r *corona_wide_format <- Corona %>% pivot_wider(names_from=status, values_from=count) *corona_wide_format ``` ``` # A tibble: 3 x 3 country Death Recovered <chr> <dbl> <dbl> 1 USA 99381 451745 2 Brazil 22746 149911 3 Russia 3633 118798 ``` --- # `pivot_longer` vs `pivot_wider` ![](tidyr/longer_wider.png) --- # `pivot_longer` and `pivot_wider` ```r profit <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), quarter = c( 1, 2, 3, 4, 2, 3, 4), income = c(2, NA, 3, NA, 4, 5, 6) ) profit ``` ``` # A tibble: 7 x 3 year quarter income <dbl> <dbl> <dbl> 1 2015 1 2 2 2015 2 NA 3 2015 3 3 4 2015 4 NA 5 2016 2 4 6 2016 3 5 7 2016 4 6 ``` --- # `pivot_longer` and `pivot_wider` .left-plot[ ``` # A tibble: 7 x 3 year quarter income <dbl> <dbl> <dbl> 1 2015 1 2 2 2015 2 NA 3 2015 3 3 4 2015 4 NA 5 2016 2 4 6 2016 3 5 7 2016 4 6 ``` ] .right-plot[ ```r profit %>% pivot_wider(names_from = year, values_from = income) ``` ``` # A tibble: 4 x 3 quarter `2015` `2016` <dbl> <dbl> <dbl> 1 1 2 NA 2 2 NA 4 3 3 3 5 4 4 NA 6 ``` ] --- # Missing values .left-plot[ ``` # A tibble: 4 x 3 quarter `2015` `2016` <dbl> <dbl> <dbl> 1 1 2 NA 2 2 NA 4 3 3 3 5 4 4 NA 6 ``` ] .right-plot[ ```r profit %>% pivot_wider(names_from = year, values_from = income) %>% *pivot_longer( *cols = c(`2015`, `2016`), *names_to = "year", *values_to = "income", *values_drop_na = TRUE ) ``` ``` # A tibble: 5 x 3 quarter year income <dbl> <chr> <dbl> 1 1 2015 2 2 2 2016 4 3 3 2015 3 4 3 2016 5 5 4 2016 6 ``` ] --- class: duke-orange, center, middle # `separate` --- # `separate()` - Separate one column into several columns. ```r Melbourne <- tibble(Date = c("10-5-2020", "11-5-2020", "12-5-2020","13-5-2020"), Tmin = c(5, 9, 9, 7), Tmax = c(18, 16, 16, 17), Rainfall= c(30, 40, 10, 5)); Melbourne ``` ``` # A tibble: 4 x 4 Date Tmin Tmax Rainfall <chr> <dbl> <dbl> <dbl> 1 10-5-2020 5 18 30 2 11-5-2020 9 16 40 3 12-5-2020 9 16 10 4 13-5-2020 7 17 5 ``` --- `separate()`: Separate one column into several columns. ``` # A tibble: 4 x 4 Date Tmin Tmax Rainfall <chr> <dbl> <dbl> <dbl> 1 10-5-2020 5 18 30 2 11-5-2020 9 16 40 3 12-5-2020 9 16 10 4 13-5-2020 7 17 5 ``` ```r Melbourne %>% separate(Date, into=c("day", "month", "year"), sep="-") ``` ``` # A tibble: 4 x 6 day month year Tmin Tmax Rainfall <chr> <chr> <chr> <dbl> <dbl> <dbl> 1 10 5 2020 5 18 30 2 11 5 2020 9 16 40 3 12 5 2020 9 16 10 4 13 5 2020 7 17 5 ``` --- `separate()` ```r df <- data.frame(x = c(NA, "a.b", "a.d", "b.c")) df ``` ``` x 1 <NA> 2 a.b 3 a.d 4 b.c ``` ```r df %>% separate(x, c("Text1", "Text2")) ``` ``` Text1 Text2 1 <NA> <NA> 2 a b 3 a d 4 b c ``` --- `separate()` .pull-left[ ```r tbl <- tibble(input = c("a", "a b", "a-b c", NA)) tbl ``` ``` # A tibble: 4 x 1 input <chr> 1 a 2 a b 3 a-b c 4 <NA> ``` ] -- .pull-right[ ```r tbl %>% separate(input, c("Input1", "Input2")) ``` ``` # A tibble: 4 x 2 Input1 Input2 <chr> <chr> 1 a <NA> 2 a b 3 a b 4 <NA> <NA> ``` ] --- `separate()` ```r tbl <- tibble(input = c("a", "a b", "a-b c", NA)); tbl ``` ``` # A tibble: 4 x 1 input <chr> 1 a 2 a b 3 a-b c 4 <NA> ``` -- ```r tbl %>% separate(input, c("Input1", "Input2", "Input3")) ``` ``` # A tibble: 4 x 3 Input1 Input2 Input3 <chr> <chr> <chr> 1 a <NA> <NA> 2 a b <NA> 3 a b c 4 <NA> <NA> <NA> ``` --- class: duke-orange, center, middle # `unite` --- # `unite()` - Unite several columns into one. .pull-left[ ```r projects <- tibble( Country = c("USA", "USA", "AUS", "AUS"), State = c("LA", "CO", "VIC", "NSW"), Cost = c(1000, 11000, 20000,30000) ) projects ``` ``` # A tibble: 4 x 3 Country State Cost <chr> <chr> <dbl> 1 USA LA 1000 2 USA CO 11000 3 AUS VIC 20000 4 AUS NSW 30000 ``` ] --- `unite()` ``` # A tibble: 4 x 3 Country State Cost <chr> <chr> <dbl> 1 USA LA 1000 2 USA CO 11000 3 AUS VIC 20000 4 AUS NSW 30000 ``` ```r projects %>% unite("Location", c("State", "Country")) ``` ``` # A tibble: 4 x 2 Location Cost <chr> <dbl> 1 LA_USA 1000 2 CO_USA 11000 3 VIC_AUS 20000 4 NSW_AUS 30000 ``` --- `unite()` ``` # A tibble: 4 x 3 Country State Cost <chr> <chr> <dbl> 1 USA LA 1000 2 USA CO 11000 3 AUS VIC 20000 4 AUS NSW 30000 ``` ```r projects %>% unite("Location", c("State", "Country"), * sep="-") ``` ``` # A tibble: 4 x 2 Location Cost <chr> <dbl> 1 LA-USA 1000 2 CO-USA 11000 3 VIC-AUS 20000 4 NSW-AUS 30000 ``` --- # `separate` vs `unite` ![](tidyr/separate_unite.png) --- class: center, middle Slides available at: hellor.netlify.app All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.com/)