+ - 0:00:00
Notes for current slide
Notes for next slide

STA 326 2.0 Programming and Data Analysis with R

⚒️ 🗜️ Writing functions in R

Dr Thiyanga Talagala

1

Today's menu

  • User-written functions
2

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

3

Functions in R

👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

4

Functions in R

👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

👉🏻 In R, functions are objects of class function.

class(length)
[1] "function"
5

Functions in R (cont.)

👉🏻 There are basically two types of functions:

💻 Built-in functions

Already created or defined in the programming framework to make our work easier.

👨 User-defined functions

Sometimes we need to create our own functions for a specific purpose.
6

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2; b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation

cal_power(2)
squared cubed
4 8
7

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2; b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation

cal_power(2)
squared cubed
4 8

👉 Functions are created using the function()

8

Basic components of a function

1. Function name

9

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function name: cal_power

10

Function name: cal_power

  • use verbs, where possible

  • should be meaningful

  • use an underscore (_) to separate words

  • avoid names of built-in functions

  • start with lower case letters. Note that R is a case sensitive language

11

Basic components of a function

  1. Function name

2. Function arguments/ inputs

12

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function arguments: x

  • value passed to the function to obtain the function's result.
13

Basic components of a function

  1. Function name

  2. Function arguments/ inputs

3. Function body

14

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_power <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function body

15

Function with single line

Mathod 1

cal_sqrt <- function(x){
x^2
}

Method 2

cal_sqrt <- function(x) x^2
16

Function body (Cont.)

  • Place spaces around all operators such as =, +, -, <-, etc.

  • Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good
17

Function body (Cont.)

  • Place spaces around all operators such as =, +, -, <-, etc.

  • Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good
  • Place a space before left parentheses except evaluating the function (function call)
if (a > 2) # good
if(a>2) # bad
# Function call ----
rnorm(2) # good
rnorm (2) # bad
18

Function body (Cont.)

  • Use extra spacing to align multiple lines with <- or =
# Bad ------
a = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))
# Good ------
a = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))
19

Function body (Cont.)

  • Spacing inside parentheses or square brackets
# Good ---
a[1, 2]
a[1, ]
if(x < 2)
# Bad ---
a[1,2]
a[1,]
if(x<2)
if( x<2 )
20

Function body (Cont.)

  • {} do not go in one single line, always two lines
# Good ---
if(y == 2){
print("even")
}
# Bad ---
if(y == 2){ print("even")}
21
22

Load the mozzie dataset

library(mozzie)
data(mozzie); head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1 1 2008 52 15 7 1 11 4 0 0
2 2 2009 1 44 23 5 16 21 2 0
Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1 6 22 0 0 8 0 0 1
2 5 18 1 0 0 0 0 0
Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1 0 0 2 1 2 0 1
2 1 1 10 5 0 0 1
Monaragala Ratnapura Kegalle
1 1 2 16
2 0 1 25
23

Load the mozzie dataset

library(mozzie)
data(mozzie); head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1 1 2008 52 15 7 1 11 4 0 0
2 2 2009 1 44 23 5 16 21 2 0
Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1 6 22 0 0 8 0 0 1
2 5 18 1 0 0 0 0 0
Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1 0 0 2 1 2 0 1
2 1 1 10 5 0 0 1
Monaragala Ratnapura Kegalle
1 1 2 16
2 0 1 25

Use Min-Max transformation to rescale all the districts variables onto 0-1 range.

Min-Max transformation is ximin(x)max(x)min(x) where x=(x1,x2,...xn).

24

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
25

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
(max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE))
head(minmax.gampaha) # Gampaha district
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
26

Min-Max transformation on mozzie

minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
(max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE))
head(minmax.gampaha) # Gampaha district
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) /
(max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE))
head(minmax.kalutara) # Kalutara district
[1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667
27

Copying-and-pasting

You could easily make errors.

A mistake copied becomes a mistake repeated.

28

Copying-and-pasting

You could easily make errors.

A mistake copied becomes a mistake repeated.

When should you write a function?

  • Whenever you need to copy and paste a block of codes many times

    • A function is a reusable block of programming code designed to do a specific task.
  • If you don't find a suitable built-in function to serve your purpose, you can write your own function

  • To share your work with others

29

Writing a function

Step 1: Function name

rescale_minmax
30

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-
31

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-

Step 3: Tell R that you are writing a function

rescale_minmax <- function(x) # Arguments/inputs should be defined inside ()
32

Step 4: Curly braces define the start and the end of your work

rescale_minmax <- function(x){
# Task
# output
}
33

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
34

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
35

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))

Remove duplication/ Make your code efficient and readable

rng <- range(x, na.rm = TRUE)
rng
[1] 0 475
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
36

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
out.rescaled
}
37

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
out.rescaled
}

Type C

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
return(out.rescaled)
}

In this situation Type A is the best.

38

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
39

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale_minmax(c(1, 200, 250, 80, NA))
[1] 0.0000000 0.7991968 1.0000000 0.3172691 NA
40

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
41

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- rescale_minmax(mozzie$Gampaha)
head(minmax.gampaha)
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
minmax.kalutara <- rescale_minmax(mozzie$Kalutara)
head(minmax.kalutara)
[1] 0.01333333 0.06666667 0.14666667 0.16000000 0.25333333 0.13333333
42

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN
43

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN

Fix the code

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE, finite=TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0.77272727 1.00000000 0.65909091 0.43181818 0.00000000 0.02272727 Inf
44

Your turn

45

Rewrite rescale_minmax so that -Inf is set to 0, and Inf is mapped to 1.

04:00
46

Your turn

47

R for Data Science - Exercise 19.2.1, Question 3

05:00
48

Your turn

49

R for Data Science - Exercise 19.2.1, Question 4

10:00
50

Functions are for humans and computers

  • Descriptive names for variables.

  • Comment your code.

51

Your turn

52

Write your own function to calculate parameter estimates of simple linear regression model.

Help: β^=(XTX)1XTY

05:00
53

Write a function to calculate confidence intervals for mean. x¯±tα/2,(n1)s(n)

10:00
54

Function arguments

cal_mean_ci <- function(x, conf){
len.x <- length(x)
se <- sd(x) / sqrt(len.x)
alpha <- 1-conf
mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
data <- c(165, 170, 175, 180, 185)
cal_mean_ci(data, 0.95)
[1] 165.1838 184.8162
55

Function with default values

cal_mean_ci <- function(x, conf = 0.95){
len.x <- length(x)
se <- sd(x) / sqrt(len.x)
alpha <- 1-conf
mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
cal_mean_ci(data)
[1] 165.1838 184.8162
cal_mean_ci(data, 0.99)
[1] 158.7221 191.2779
56

In-class questions

57

Problem 1

Write a function to calculate the correlation coefficient

r=i=1n(xix¯)(yiy¯)i=1n(xix¯)2i=1n(yiy¯)2

Do not use the function cor inside your function.

08:00
58

Problem 2

Write a function to generate 100 random numbers from a normal distribution and plot the distribution of the random numbers. Your function should display the generated random numbers and the corresponding plot.

10:00
59

Problem 3

Write a function to compute z-score value of a A/L Mathematics student given the marks of the student. Assume

mean(Mathematics) = 60, sd(Mathematics) = 10,

mean(Chemistry) = 45, sd(Chemistry) = 20,

mean(Physics) = 55, sd(Physics) = 5.

05:00
60
61

Local variables vs Global variables

in-class discussion using R Demo

62

Problem 4

Write a function to calculate the median.

help:

5%%2
[1] 1
4%%2
[1] 0

Note: Do not use the built-in function median inside your function.

08:00
63

Next week: control structures

64

Thank you!

Slides available at: hellor.netlify.app

All rights reserved by Thiyanga S. Talagala

65

Today's menu

  • User-written functions
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow