STA 326 2.0 Programming and Data Analysis with R⚒️ 🗜️ Writing functions in R Dr Thiyanga Talagala1

User-written functions

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

Functions in R

👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

Functions in R

👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

👉🏻 In R, functions are objects of class function.

class(length)

[1] "function"

Functions in R (cont.)

👉🏻 There are basically two types of functions:

💻 Built-in functions

  Already created or defined in the programming framework to make our work easier.

👨 User-defined functions

  Sometimes we need to create our own functions for a specific purpose.

Syntax
name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example
cal_power <- function(x){
a <- x^2; b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation
cal_power(2)

squared   cubed 
      4       8
7

Syntax


name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2; b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation

cal_power(2)

squared   cubed 
      4       8

👉 Functions are created using the function()

Basic components of a function

1. Function name

Syntax


name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function name: cal_power

use verbs, where possible
should be meaningful
use an underscore (_) to separate words
avoid names of built-in functions
start with lower case letters. Note that R is a case sensitive language

Basic components of a function

Function name

2. Function arguments/ inputs

Syntax


name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
 a <- x^2
 b <- x^3
 out <- c(a, b)
 names(out) <- c("squared", "cubed")
 out # or return(out)
}

Function arguments: x

value passed to the function to obtain the function's result.

Basic components of a function

Function name
Function arguments/ inputs

3. Function body

Syntax


name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
  a <- x^2
  b <- x^3
  out <- c(a, b)
  names(out) <- c("squared", "cubed")
  out # or return(out)
}

Function body

Function with single lineMathod 1
cal_sqrt <- function(x){
x^2
}

Method 2
cal_sqrt <- function(x) x^2

16

Function body (Cont.)

Place spaces around all operators such as =, +, -, <-, etc.
Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good

Function body (Cont.)

Place spaces around all operators such as =, +, -, <-, etc.
Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good

Place a space before left parentheses except evaluating the function (function call)

if (a > 2) # good
if(a>2) # bad
# Function call ----
rnorm(2) # good
rnorm (2) # bad

Function body (Cont.)

Use extra spacing to align multiple lines with <- or =

# Bad ------
a = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))
# Good ------
a  = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))

Function body (Cont.)

Spacing inside parentheses or square brackets

# Good ---
a[1, 2]
a[1, ]
if(x < 2)
# Bad ---
a[1,2]
a[1,]
if(x<2)
if( x<2 )

Function body (Cont.)

{} do not go in one single line, always two lines

# Good ---
if(y == 2){
print("even")
}
# Bad ---
if(y == 2){ print("even")}

Load the mozzie dataset

library(mozzie)
data(mozzie); head(mozzie, 2)

  ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1  1 2008   52      15       7        1    11      4            0     0
2  2 2009    1      44      23        5    16     21            2     0
  Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1          6     22      0           0      8        0        0         1
2          5     18      1           0      0        0        0         0
  Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1      0           0          2        1            2           0       1
2      1           1         10        5            0           0       1
  Monaragala Ratnapura Kegalle
1          1         2      16
2          0         1      25

Load the mozzie dataset

library(mozzie)
data(mozzie); head(mozzie, 2)

  ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1  1 2008   52      15       7        1    11      4            0     0
2  2 2009    1      44      23        5    16     21            2     0
  Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1          6     22      0           0      8        0        0         1
2          5     18      1           0      0        0        0         0
  Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1      0           0          2        1            2           0       1
2      1           1         10        5            0           0       1
  Monaragala Ratnapura Kegalle
1          1         2      16
2          0         1      25

Use Min-Max transformation to rescale all the districts variables onto 0-1 range.

Min-Max transformation is $\frac{x_{i} - m i n (x)}{m a x (x) - m i n (x)}$ where $x = (x_{1}, x_{2}, . . . x_{n})$ .

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district

[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district

[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263

minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
  (max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE)) 
head(minmax.gampaha) # Gampaha district

[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district

[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263

minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
  (max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE)) 
head(minmax.gampaha) # Gampaha district

[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625

minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) /
  (max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE))
head(minmax.kalutara) # Kalutara district

[1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667

Copying-and-pasting

You could easily make errors.

A mistake copied becomes a mistake repeated.

Copying-and-pasting

You could easily make errors.

A mistake copied becomes a mistake repeated.

When should you write a function?

Whenever you need to copy and paste a block of codes many times
- A function is a reusable block of programming code designed to do a specific task.
If you don't find a suitable built-in function to serve your purpose, you can write your own function
To share your work with others

Writing a function

Step 1: Function name

rescale_minmax

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-

Step 3: Tell R that you are writing a function

rescale_minmax <- function(x) # Arguments/inputs should be defined inside ()

Step 4: Curly braces define the start and the end of your work

rescale_minmax <- function(x){
# Task
# output
}

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
  (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))

Remove duplication/ Make your code efficient and readable

rng <- range(x, na.rm = TRUE)
rng

[1]   0 475

rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
  out.rescaled
}

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
  out.rescaled
}

Type C

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
  return(out.rescaled)
}

In this situation Type A is the best.

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

rescale_minmax(c(1, 200, 250, 80, NA))

[1] 0.0000000 0.7991968 1.0000000 0.3172691        NA

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)

[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)

[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263

minmax.gampaha <- rescale_minmax(mozzie$Gampaha)
head(minmax.gampaha)

[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625

minmax.kalutara <- rescale_minmax(mozzie$Kalutara)
head(minmax.kalutara)

[1] 0.01333333 0.06666667 0.14666667 0.16000000 0.25333333 0.13333333

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)

[1]   0   0   0   0   0   0 NaN

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)

[1]   0   0   0   0   0   0 NaN

Fix the code

rescale_minmax <- function(x){
   rng <- range(x, na.rm = TRUE, finite=TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)

[1] 0.77272727 1.00000000 0.65909091 0.43181818 0.00000000 0.02272727        Inf

Your turn45

Rewrite rescale_minmax so that -Inf is set to 0, and Inf is mapped to 1.

04:00

Your turn47

R for Data Science - Exercise 19.2.1, Question 3

05:00

Your turn49

R for Data Science - Exercise 19.2.1, Question 4

10:00

Functions are for humans and computers

Descriptive names for variables.
Comment your code.

Your turn52

Write your own function to calculate parameter estimates of simple linear regression model.

Help: $\hat{β} = (X^{T} X)^{- 1} X^{T} Y$

05:00

Write a function to calculate confidence intervals for mean. $\bar{x} \pm t_{α / 2, (n - 1)} \frac{s}{\sqrt{(} n)}$

10:00

Function arguments

cal_mean_ci <- function(x, conf){
  len.x <- length(x)
  se <- sd(x) / sqrt(len.x)
  alpha <- 1-conf
  mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
data <- c(165, 170, 175, 180, 185)
cal_mean_ci(data, 0.95)

[1] 165.1838 184.8162

Function with default values

cal_mean_ci <- function(x, conf = 0.95){
  len.x <- length(x)
  se <- sd(x) / sqrt(len.x)
  alpha <- 1-conf
  mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
cal_mean_ci(data)

[1] 165.1838 184.8162

cal_mean_ci(data, 0.99)

[1] 158.7221 191.2779

In-class questions57

Problem 1

Write a function to calculate the correlation coefficient

$r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} \sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}}$

Do not use the function cor inside your function.

08:00

Problem 2

Write a function to generate 100 random numbers from a normal distribution and plot the distribution of the random numbers. Your function should display the generated random numbers and the corresponding plot.

10:00

Problem 3

Write a function to compute z-score value of a A/L Mathematics student given the marks of the student. Assume

mean(Mathematics) = 60, sd(Mathematics) = 10,

mean(Chemistry) = 45, sd(Chemistry) = 20,

mean(Physics) = 55, sd(Physics) = 5.

05:00

Local variables vs Global variables

in-class discussion using R Demo

Problem 4

Write a function to calculate the median.

help:

5%%2

[1] 1

4%%2

[1] 0

Note: Do not use the built-in function median inside your function.

08:00

Next week: control structures64

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

STA 326 2.0 Programming and Data Analysis with R

⚒️ 🗜️ Writing functions in R

Dr Thiyanga Talagala

Today's menu

Functions in R

Functions in R

Functions in R

Functions in R (cont.)

Syntax

Example

Evaluation

Syntax

Example

Evaluation

Basic components of a function

Syntax

Example

Basic components of a function

Syntax

Example

Basic components of a function

Syntax

Example

Function with single line

Mathod 1

Method 2

Function body (Cont.)

Function body (Cont.)

Function body (Cont.)

Function body (Cont.)

Function body (Cont.)

Copying-and-pasting

Copying-and-pasting

When should you write a function?

Writing a function

Step 1: Function name

Writing a function

Step 1: Function name

Step 2: Assign your function to the name

Writing a function

Step 1: Function name

Step 2: Assign your function to the name

Step 3: Tell R that you are writing a function

Step 4: Curly braces define the start and the end of your work

Step 5: Function inputs, task and outputs

Step 5: Function inputs, task and outputs

Step 5: Function inputs, task and outputs

Step 6: Complete your function

Step 6: Complete your function

Step 7: Check your function with a few different inputs

Step 7: Check your function with a few different inputs

Back to our original example

Back to our original example

Move forward: When the requirements changes

Move forward: When the requirements changes

Fix the code

Your turn

Your turn

Your turn

Functions are for humans and computers

Your turn

Function arguments

Function with default values

In-class questions

Problem 1

Problem 2

Problem 3

Local variables vs Global variables

Problem 4

Next week: control structures

Thank you!

Today's menu

Help