class: center, middle, inverse, title-slide # STA 517 3.0 Programming and Statistical Computing with R ## Functionals ### Dr Thiyanga Talagala --- # Functionals > A functional is a function that takes a function as an input and returns a vector as output. > Hadley Wickham, Advanced R ```r statistic <- function(f){ data <- c(10, 20, 30, 40, 62, 63) f(data) } ``` ```r statistic(mean) ``` ``` [1] 37.5 ``` ```r statistic(sum) ``` ``` [1] 225 ``` --- # Use of functionals: lapply lapply: loop over a list and evaluate a function on each element. ```r x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6)) x ``` ``` $a [1] 1 2 3 4 5 6 7 8 $b [1] 2.1 3.2 4.2 5.0 6.0 ``` ```r #We are passing `mean` is an argument to lapply lapply(x, mean) ``` ``` $a [1] 4.5 $b [1] 4.1 ``` ```r lapply(x, sum) ``` ``` $a [1] 36 $b [1] 20.5 ``` --- # Use of functionals: lapply (cont.) ```r cv <- function(data){sd(data)/mean(data)} lapply(x, cv) ``` ``` $a [1] 0.5443311 $b [1] 0.3706996 ``` --- # lapply is a for-loop replacement ```r x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6)) x ``` ``` $a [1] 1 2 3 4 5 6 7 8 $b [1] 2.1 3.2 4.2 5.0 6.0 ``` ```r result_x <- list() result_x ``` ``` list() ``` ```r for (i in 1:2){ result_x[[i]] <- mean(x[[i]]) } result_x ``` ``` [[1]] [1] 4.5 [[2]] [1] 4.1 ``` --- # Use of functionals: sapply lapply: loop over a list and evaluate a function on each element. ```r x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6)) x ``` ``` $a [1] 1 2 3 4 5 6 7 8 $b [1] 2.1 3.2 4.2 5.0 6.0 ``` ```r #We are passing `mean` is an argument to lapply sapply(x, mean) ``` ``` a b 4.5 4.1 ``` ```r sapply(x, sum) ``` ``` a b 36.0 20.5 ``` Same as `lapply` but the output is a vector. --- # `map()` function in purrr ```r library(purrr) ``` ```r x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6)) x ``` ``` $a [1] 1 2 3 4 5 6 7 8 $b [1] 2.1 3.2 4.2 5.0 6.0 ``` ```r map(x, mean) ``` ``` $a [1] 4.5 $b [1] 4.1 ``` - The base equivalent to `map()` is `lapply()`. --- # `map` syntax > map(.x, .f) `.x` - The object we want to iterate over (a vector, list or dataframe) `.f` - function (What are we going to do?) For each element (vector/ list) or for each column in a data frame apply `.f` function. ```r map(c(4, 9, 16), sqrt) ``` ``` [[1]] [1] 2 [[2]] [1] 3 [[3]] [1] 4 ``` --- ## Demo: Visualization of `map` --- # `map()` with data frames ```r head(trees) ``` ``` ## Girth Height Volume ## 1 8.3 70 10.3 ## 2 8.6 65 10.3 ## 3 8.8 63 10.2 ## 4 10.5 72 16.4 ## 5 10.7 81 18.8 ## 6 10.8 83 19.7 ``` ```r trees %>% map(mean) ``` ``` ## $Girth ## [1] 13.24839 ## ## $Height ## [1] 76 ## ## $Volume ## [1] 30.17097 ``` --- # `map()` with data frames ```r iris %>% dplyr::select_if(is.numeric) %>% map(mean) ``` ``` ## $Sepal.Length ## [1] 5.843333 ## ## $Sepal.Width ## [1] 3.057333 ## ## $Petal.Length ## [1] 3.758 ## ## $Petal.Width ## [1] 1.199333 ``` --- # `map` additional inputs to the function Eg: `na.rm=TRUE` ### Method 1 ```r abc <- list(a = c(1, NA, 3), b = 4:6, c = 10:12) map(abc, mean) ``` ``` ## $a ## [1] NA ## ## $b ## [1] 5 ## ## $c ## [1] 11 ``` ```r map(abc, mean, na.rm=TRUE) ``` ``` ## $a ## [1] 2 ## ## $b ## [1] 5 ## ## $c ## [1] 11 ``` --- ### Method 2 ```r map(abc, function(.x){ mean(.x, na.rm=TRUE) }) ``` ``` ## $a ## [1] 2 ## ## $b ## [1] 5 ## ## $c ## [1] 11 ``` --- ### Method 3 ```r map(abc, ~mean(.x, na.rm=TRUE)) ``` ``` ## $a ## [1] 2 ## ## $b ## [1] 5 ## ## $c ## [1] 11 ``` --- ## Your turn **Question 1** Identify the number of unique values in each column of the `gapminder` dataset
05
:
00
--- ## Returning types - `map`: list - `map_chr` : charactor vector - `map_dbl` : double vector - `map_int` : integer vector - `map_lgl` : logical vector - `map_dfc` : data frame (by column) - `map_dfr` : data frame (by row) --- ```r abc <- list(a = c(1, NA, 3), b = 4:6, c = 10:12) abc %>% map(is.numeric) ``` ``` ## $a ## [1] TRUE ## ## $b ## [1] TRUE ## ## $c ## [1] TRUE ``` ```r abc %>% map_lgl(is.numeric) ``` ``` ## a b c ## TRUE TRUE TRUE ``` ```r abc %>% map_chr(is.numeric) ``` ``` ## a b c ## "TRUE" "TRUE" "TRUE" ``` --- ```r map_output <- map(mtcars, function(x) length(unique(x))) head(map_output, 3) ``` ``` $mpg [1] 25 $cyl [1] 3 $disp [1] 27 ``` --- ```r set.seed(2020) x <- list(a=rnorm(5), b=rnorm(6)) map(x, mean) ``` ``` $a [1] -0.8692886 $b [1] 0.4089487 ``` ```r map_df(x, mean) ``` ``` # A tibble: 1 × 2 a b <dbl> <dbl> 1 -0.869 0.409 ``` --- **Question 2** Identify the number of unique values in each column of the `gapminder` dataset. The output of the map function should be a integer vector.
05
:
00
--- ## Your turn **Question 3** Split the iris data set according to species type and fit simple linear regression model between sepal.length and sepal.width
05
:
00
--- # map2(.x, .y, .f) For each element of `.x` and `.y` do `f(.x, .y)` ```r abc ``` ``` ## $a ## [1] 1 NA 3 ## ## $b ## [1] 4 5 6 ## ## $c ## [1] 10 11 12 ``` ```r cde <- list(x = c(10, 20, 30), y= c(100, 200, 300), z=c(0, 0, 0)) cde ``` ``` ## $x ## [1] 10 20 30 ## ## $y ## [1] 100 200 300 ## ## $z ## [1] 0 0 0 ``` --- # map2(.x, .y, .f) ```r map2(abc, cde, sum) ``` ``` ## $a ## [1] NA ## ## $b ## [1] 615 ## ## $c ## [1] 33 ``` --- ## Your turn **Question 4** Split the iris data set according to species type and fit simple linear regression model between sepal.length and sepal.width **and obtain the predictions.**
05
:
00
--- # `pmap` take more than 2 lists or data frame with argument names ```r pqr <- list(p=c(1, 1, 1), q=c(2, 2, 2), r =c(3, 3, 3)) l <- list(abc, cde, pqr) pmap(l, sum) ``` ``` ## $a ## [1] NA ## ## $b ## [1] 621 ## ## $c ## [1] 42 ``` If you want to operate over more than 3 inputs you always need to first include the elements into a list --- class: center, middle Slides available at: hellor.netlify.app All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.com/) Reference: Advanced R, Hadley Wickham