class: center, middle, inverse, title-slide # STA 326 2.0 Programming and Data Analysis with R ## ⚒️ 🗜️ Writing functions in R ### ### Dr Thiyanga Talagala --- <style type="text/css"> .remark-slide-content { font-size: 30px; } </style> # Today's menu .pull-left[ - User-written functions ] .pull-right[ <center><img src="baking.jpeg" height="500px"/></center> ] --- ## Functions in R 👉🏻 Perform a specific task according to a set of instructions. -- 👉🏻 Some functions we have discussed so far, > `c`, `matrix`, `array`, `list`, `data.frame`, `str`, `dim`, `length`, `nrow`, `plot` -- 👉🏻 In R, functions are **objects** of **class** *function*. ```r class(length) ``` ``` [1] "function" ``` --- ## Functions in R (cont.) 👉🏻 There are basically two types of functions: > 💻 Built-in functions Already created or defined in the programming framework to make our work easier. > 👨 User-defined functions Sometimes we need to create our own functions for a specific purpose. --- .pull-left[ ## Syntax ```r name <- function(arg1, aug2, ...){ <FUNCTION BODY> return(value) } ``` ] .pull-right[ ### Example ```r cal_power <- function(x){ a <- x^2; b <- x^3 out <- c(a, b) names(out) <- c("squared", "cubed") out # or return(out) } ``` ### Evaluation ```r cal_power(2) ``` ``` squared cubed 4 8 ``` ] -- 👉 Functions are created using the `function()` --- class: inverse ## Basic components of a function **1. Function name** --- .pull-left[ ### Syntax ```r name <- function(arg1, aug2, ...){ <FUNCTION BODY> return(value) } ``` ] .pull-right[ ### Example ```r *cal_power <- function(x){ a <- x^2 b <- x^3 out <- c(a, b) names(out) <- c("squared", "cubed") out # or return(out) } ``` ] .content-box-yellow[Function name: **`cal_power`**] --- .content-box-yellow[Function name: **`cal_power`**] - use verbs, where possible - should be meaningful - use an underscore (_) to separate words - avoid names of built-in functions - start with lower case letters. Note that R is a case sensitive language --- class: inverse ## Basic components of a function 1. Function name **2. Function arguments/ inputs** --- .pull-left[ ### Syntax ```r name <- function(arg1, aug2, ...){ <FUNCTION BODY> return(value) } ``` ] .pull-right[ ### Example ```r *cal_power <- function(x){ a <- x^2 b <- x^3 out <- c(a, b) names(out) <- c("squared", "cubed") out # or return(out) } ``` ] .content-box-yellow[Function arguments: **`x`**] - value passed to the function to obtain the function's result. --- class: inverse ## Basic components of a function 1. Function name 2. Function arguments/ inputs **3. Function body** --- .pull-left[ ### Syntax ```r name <- function(arg1, aug2, ...){ <FUNCTION BODY> return(value) } ``` ] .pull-right[ ### Example ```r cal_power <- function(x){ * a <- x^2 * b <- x^3 * out <- c(a, b) * names(out) <- c("squared", "cubed") * out # or return(out) } ``` ] .content-box-yellow[Function body] --- ## Function with single line .pull-left[ ### Mathod 1 ```r cal_sqrt <- function(x){ x^2 } ``` ] .pull-right[ ### Method 2 ```r cal_sqrt <- function(x) x^2 ``` ] --- ### Function body (Cont.) - Place spaces around all operators such as =, +, -, <-, etc. - Exception: Do not place spaces around the operators :, :: and ::: ```r 1+2 # bad 1 + 2 # good ``` -- - Place a space before left parentheses except evaluating the function (function call) ```r if (a > 2) # good if(a>2) # bad # Function call ---- rnorm(2) # good rnorm (2) # bad ``` --- ### Function body (Cont.) - Use extra spacing to align multiple lines with <- or = ```r # Bad ------ a = sum(c(1, 5, 8, 10))/2 sd = sd(c(1, 5, 8, 10)) # Good ------ a = sum(c(1, 5, 8, 10))/2 sd = sd(c(1, 5, 8, 10)) ``` --- ### Function body (Cont.) - Spacing inside parentheses or square brackets ```r # Good --- a[1, 2] a[1, ] if(x < 2) # Bad --- a[1,2] a[1,] if(x<2) if( x<2 ) ``` --- ### Function body (Cont.) - {} do not go in one single line, always two lines ```r # Good --- if(y == 2){ print("even") } # Bad --- if(y == 2){ print("even")} ``` --- background-image: url('dengue.jpg') background-position: center background-size: contain --- **Load the mozzie dataset** ```r library(mozzie) data(mozzie); head(mozzie, 2) ``` ``` ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle 1 1 2008 52 15 7 1 11 4 0 0 2 2 2009 1 44 23 5 16 21 2 0 Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo 1 6 22 0 0 8 0 0 1 2 5 18 1 0 0 0 0 0 Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla 1 0 0 2 1 2 0 1 2 1 1 10 5 0 0 1 Monaragala Ratnapura Kegalle 1 1 2 16 2 0 1 25 ``` -- > Use Min-Max transformation to rescale all the districts variables onto 0-1 range. > Min-Max transformation is `\(\frac{x_i-min(x)}{max(x)-min(x)}\)` where `\(x=(x_1, x_2, ...x_n)\)`. --- ** Min-Max transformation on `mozzie`** ```r minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE)) head(minmax.colombo) # Colombo district ``` ``` [1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263 ``` -- ```r minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) / (max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE)) head(minmax.gampaha) # Gampaha district ``` ``` [1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625 ``` -- ```r minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) / (max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE)) head(minmax.kalutara) # Kalutara district ``` ``` [1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667 ``` --- ## Copying-and-pasting > You could easily make errors. > A mistake copied becomes a mistake repeated. -- ## When should you write a function? - Whenever you need to copy and paste a block of codes many times - A function is a reusable block of programming code designed to do a specific task. - If you don't find a suitable built-in function to serve your purpose, you can write your own function - To share your work with others --- # Writing a function ### Step 1: Function name ```r rescale_minmax ``` -- ### Step 2: Assign your function to the name ```r rescale_minmax <- ``` -- ### Step 3: Tell R that you are writing a function ```r_ rescale_minmax <- function(x) # Arguments/inputs should be defined inside () ``` --- ### Step 4: Curly braces define the start and the end of your work ```r rescale_minmax <- function(x){ # Task # output } ``` --- ## Step 5: Function inputs, task and outputs **Find all the inputs that correspond to a given function output?** ```r # Colombo district (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE)) ``` -- **Re-write the code with general names** ```r x <- mozzie$Colombo (x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE)) ``` -- **Remove duplication/ Make your code efficient and readable** ```r rng <- range(x, na.rm = TRUE) rng ``` ``` [1] 0 475 ``` ```r rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) ``` --- # Step 6: Complete your function .pull-left[ **Type A** ```r rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) } ``` **Type B** ```r rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) out.rescaled <- (x - rng[1]) / (rng[2] - rng[1]) out.rescaled } ``` ] -- .pull-right[ **Type C** ```r rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) out.rescaled <- (x - rng[1]) / (rng[2] - rng[1]) return(out.rescaled) } ``` > In this situation Type A is the best. ] --- # Step 7: Check your function with a few different inputs ```r rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) } ``` -- ```r rescale_minmax(c(1, 200, 250, 80, NA)) ``` ``` [1] 0.0000000 0.7991968 1.0000000 0.3172691 NA ``` --- ## Back to our original example ```r minmax.colombo <- rescale_minmax(mozzie$Colombo) head(minmax.colombo) ``` ``` [1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263 ``` -- ```r minmax.gampaha <- rescale_minmax(mozzie$Gampaha) head(minmax.gampaha) ``` ``` [1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625 ``` ```r minmax.kalutara <- rescale_minmax(mozzie$Kalutara) head(minmax.kalutara) ``` ``` [1] 0.01333333 0.06666667 0.14666667 0.16000000 0.25333333 0.13333333 ``` --- # Move forward: When the requirements changes ```r new.data.col <- c(400, 500, 350, 250, 60, 70, Inf) rescale_minmax(new.data.col) ``` ``` [1] 0 0 0 0 0 0 NaN ``` -- ## Fix the code ```r rescale_minmax <- function(x){ * rng <- range(x, na.rm = TRUE, finite=TRUE) (x - rng[1]) / (rng[2] - rng[1]) } ``` ```r new.data.col <- c(400, 500, 350, 250, 60, 70, Inf) rescale_minmax(new.data.col) ``` ``` [1] 0.77272727 1.00000000 0.65909091 0.43181818 0.00000000 0.02272727 Inf ``` --- class: duke-orange, center, middle # Your turn --- Rewrite `rescale_minmax` so that `-Inf` is set to 0, and `Inf` is mapped to 1.
04
:
00
--- class: duke-orange, center, middle # Your turn --- R for Data Science - Exercise 19.2.1, Question 3 <iframe src="https://r4ds.had.co.nz/functions.html" width="100%" height="400px"></iframe>
05
:
00
--- class: duke-orange, center, middle # Your turn --- R for Data Science - Exercise 19.2.1, Question 4 <iframe src="https://r4ds.had.co.nz/functions.html" width="100%" height="400px"></iframe>
10
:
00
--- background-image: url('laptop.jpg') background-position: center background-size: cover .content-box-yellow[ # Functions are for humans and computers - Descriptive names for variables. - Comment your code. ] --- class: duke-orange, center, middle # Your turn --- Write your own function to calculate parameter estimates of simple linear regression model. Help: `$$\hat{\beta}=(X^TX)^{-1}X^TY$$` ![](slr2.png)
05
:
00
--- Write a function to calculate confidence intervals for mean. `$$\bar{x} \pm t_{\alpha/2, (n-1)}\frac{s}{\sqrt(n)}$$`
10
:
00
--- ## Function arguments ```r cal_mean_ci <- function(x, conf){ len.x <- length(x) se <- sd(x) / sqrt(len.x) alpha <- 1-conf mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1) } data <- c(165, 170, 175, 180, 185) cal_mean_ci(data, 0.95) ``` ``` [1] 165.1838 184.8162 ``` --- ## Function with default values ```r cal_mean_ci <- function(x, conf = 0.95){ len.x <- length(x) se <- sd(x) / sqrt(len.x) alpha <- 1-conf mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1) } cal_mean_ci(data) ``` ``` [1] 165.1838 184.8162 ``` ```r cal_mean_ci(data, 0.99) ``` ``` [1] 158.7221 191.2779 ``` --- class: inverse, center, middle ## In-class questions --- ## Problem 1 Write a function to calculate the correlation coefficient `$$r=\frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2\sum_{i=1}^n(y_i-\bar{y})^2}}$$` Do not use the function `cor` inside your function.
08
:
00
--- ## Problem 2 Write a function to generate 100 random numbers from a normal distribution and plot the distribution of the random numbers. Your function should display the generated random numbers and the corresponding plot.
10
:
00
--- ## Problem 3 Write a function to compute z-score value of a A/L Mathematics student given the marks of the student. Assume mean(Mathematics) = 60, sd(Mathematics) = 10, mean(Chemistry) = 45, sd(Chemistry) = 20, mean(Physics) = 55, sd(Physics) = 5.
05
:
00
--- background-image: url('PAGE-05.jpeg') background-position: center background-size: contain --- ## Local variables vs Global variables in-class discussion using R Demo --- ## Problem 4 Write a function to calculate the median. help: ```r 5%%2 ``` ``` [1] 1 ``` ```r 4%%2 ``` ``` [1] 0 ``` Note: Do not use the built-in function `median` inside your function.
08
:
00
--- class: inverse, center, middle # Next week: control structures --- class: center, middle ## Thank you! Slides available at: hellor.netlify.app All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.app/)