STA 517 3.0 Programming and Statistical Computing with R

class: center, middle, inverse, title-slide

# STA 517 3.0 Programming and Statistical Computing with R
## Functionals
### Dr Thiyanga Talagala

---

# Functionals

> A functional is a function that takes a function as an input and returns a vector as output.

> Hadley Wickham, Advanced R

```r
statistic <- function(f){
  data <- c(10, 20, 30, 40, 62, 63)
  f(data)
}
```

```r
statistic(mean)
```

```
[1] 37.5
```

```r
statistic(sum)
```

```
[1] 225
```

---

# Use of functionals: lapply

lapply: loop over a list and evaluate a function on each element.

```r
x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6))
x
```

```
$a
[1] 1 2 3 4 5 6 7 8

$b
[1] 2.1 3.2 4.2 5.0 6.0
```

```r
#We are passing `mean` is an argument to lapply
lapply(x, mean)
```

```
$a
[1] 4.5

$b
[1] 4.1
```

```r
lapply(x, sum)
```

```
$a
[1] 36

$b
[1] 20.5
```

---
# Use of functionals: lapply (cont.)

```r
cv <- function(data){sd(data)/mean(data)}
lapply(x, cv)
```

```
$a
[1] 0.5443311

$b
[1] 0.3706996
```

---
# lapply is a for-loop replacement

```r
x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6))
x
```

```
$a
[1] 1 2 3 4 5 6 7 8

$b
[1] 2.1 3.2 4.2 5.0 6.0
```

```r
result_x <- list()
result_x
```

```
list()
```

```r
for (i in 1:2){
 result_x[[i]] <- mean(x[[i]])
  
}
result_x
```

```
[[1]]
[1] 4.5

[[2]]
[1] 4.1
```

---
# Use of functionals: sapply

lapply: loop over a list and evaluate a function on each element.

```r
x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6))
x
```

```
$a
[1] 1 2 3 4 5 6 7 8

$b
[1] 2.1 3.2 4.2 5.0 6.0
```

```r
#We are passing `mean` is an argument to lapply
sapply(x, mean)
```

```
  a   b 
4.5 4.1 
```

```r
sapply(x, sum)
```

```
   a    b 
36.0 20.5 
```

Same as `lapply` but the output is a vector.
---

# `map()` function in purrr

```r
library(purrr)
```

```r
x <- list( a = 1:8, b = c(2.1, 3.2, 4.2, 5, 6))
x
```

```
$a
[1] 1 2 3 4 5 6 7 8

$b
[1] 2.1 3.2 4.2 5.0 6.0
```

```r
map(x, mean)
```

```
$a
[1] 4.5

$b
[1] 4.1
```

- The base equivalent to `map()` is `lapply()`.

---
# `map` syntax

> map(.x, .f)

`.x` - The object we want to iterate over (a vector, list or dataframe)

`.f` - function (What are we going to do?)

For each element (vector/ list) or for each column in a data frame apply `.f` function.

```r
map(c(4, 9, 16), sqrt)
```

```
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4
```
---

## Demo: Visualization of `map`

---

# `map()` with data frames

```r
head(trees)
```

```
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7
```

```r
trees %>% 
  map(mean)
```

```
## $Girth
## [1] 13.24839
## 
## $Height
## [1] 76
## 
## $Volume
## [1] 30.17097
```

---
# `map()` with data frames

```r
iris %>% 
  dplyr::select_if(is.numeric) %>% 
  map(mean)
```

```
## $Sepal.Length
## [1] 5.843333
## 
## $Sepal.Width
## [1] 3.057333
## 
## $Petal.Length
## [1] 3.758
## 
## $Petal.Width
## [1] 1.199333
```

---

# `map` additional inputs to the function

Eg: `na.rm=TRUE`

### Method 1

```r
abc <- list(a = c(1, NA, 3), b = 4:6, c = 10:12)
map(abc, mean)
```

```
## $a
## [1] NA
## 
## $b
## [1] 5
## 
## $c
## [1] 11
```

```r
map(abc, mean, na.rm=TRUE)
```

```
## $a
## [1] 2
## 
## $b
## [1] 5
## 
## $c
## [1] 11
```

---

### Method 2

```r
map(abc, function(.x){
  mean(.x, na.rm=TRUE)
})
```

```
## $a
## [1] 2
## 
## $b
## [1] 5
## 
## $c
## [1] 11
```

---
### Method 3

```r
map(abc, ~mean(.x, na.rm=TRUE))
```

```
## $a
## [1] 2
## 
## $b
## [1] 5
## 
## $c
## [1] 11
```

---
## Your turn

**Question 1**

Identify the number of unique values in each column of the `gapminder` dataset

---
## Returning types

- `map`: list

- `map_chr` : charactor vector

- `map_dbl` : double vector

- `map_int` : integer vector

- `map_lgl` : logical vector

- `map_dfc` : data frame (by column)

- `map_dfr` : data frame (by row)

---

```r
abc <- list(a = c(1, NA, 3), b = 4:6, c = 10:12)
abc %>% map(is.numeric)
```

```
## $a
## [1] TRUE
## 
## $b
## [1] TRUE
## 
## $c
## [1] TRUE
```

```r
abc %>% map_lgl(is.numeric)
```

```
##    a    b    c 
## TRUE TRUE TRUE
```

```r
abc %>% map_chr(is.numeric)
```

```
##      a      b      c 
## "TRUE" "TRUE" "TRUE"
```

---

```r
map_output <- map(mtcars, function(x) length(unique(x)))
head(map_output, 3)
```

```
$mpg
[1] 25

$cyl
[1] 3

$disp
[1] 27
```

---

```r
set.seed(2020)
x <- list(a=rnorm(5), b=rnorm(6))
map(x, mean)
```

```
$a
[1] -0.8692886

$b
[1] 0.4089487
```

```r
map_df(x, mean)
```

```
# A tibble: 1 × 2
       a     b
   <dbl> <dbl>
1 -0.869 0.409
```

---

**Question 2**

Identify the number of unique values in each column of the `gapminder` dataset. The output of the map function should be a integer vector.

---
## Your turn

**Question 3**

Split the iris data set according to species type and fit simple linear regression model between sepal.length and sepal.width

<div class="countdown" id="timer_62367b57" style="right:0;bottom:0;" data-warnwhen="0">
<code class="countdown-time"><span class="countdown-digits minutes">05</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
---
# map2(.x, .y, .f)

For each element of `.x` and `.y` do `f(.x, .y)`

```r
abc
```

```
## $a
## [1]  1 NA  3
## 
## $b
## [1] 4 5 6
## 
## $c
## [1] 10 11 12
```

```r
cde <- list(x = c(10, 20, 30), y= c(100, 200, 300), z=c(0, 0, 0))
cde
```

```
## $x
## [1] 10 20 30
## 
## $y
## [1] 100 200 300
## 
## $z
## [1] 0 0 0
```

---
# map2(.x, .y, .f)

```r
map2(abc, cde, sum)
```

```
## $a
## [1] NA
## 
## $b
## [1] 615
## 
## $c
## [1] 33
```

---

## Your turn

**Question 4**

Split the iris data set according to species type and fit simple linear regression model between sepal.length and sepal.width **and obtain the predictions.**

---

# `pmap`

take more than 2 lists or data frame with argument names

```r
pqr <- list(p=c(1, 1, 1), q=c(2, 2, 2), r =c(3, 3, 3))
l <- list(abc, cde, pqr)
pmap(l, sum)
```

```
## $a
## [1] NA
## 
## $b
## [1] 621
## 
## $c
## [1] 42
```

If you want to operate over more than 3 inputs you always need to first include the elements into a list

---

class: center, middle

Slides available at: hellor.netlify.app

Reference: Advanced R, Hadley Wickham