👉🏻 Perform a specific task according to a set of instructions.
👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,
c
,matrix
,array
,list
,data.frame
,str
,dim
,length
,nrow
,plot
👉🏻 Perform a specific task according to a set of instructions. 👉🏻 Some functions we have discussed so far,
c
,matrix
,array
,list
,data.frame
,str
,dim
,length
,nrow
,plot
👉🏻 In R, functions are objects of class function.
class(length)
[1] "function"
👉🏻 There are basically two types of functions:
💻 Built-in functions
Already created or defined in the programming framework to make our work easier.
👨 User-defined functions
Sometimes we need to create our own functions for a specific purpose.
name <- function(arg1, aug2, ...){<FUNCTION BODY>return(value)}
cal_power <- function(x){a <- x^2; b <- x^3out <- c(a, b)names(out) <- c("squared", "cubed")out # or return(out)}
cal_power(2)
squared cubed 4 8
name <- function(arg1, aug2, ...){<FUNCTION BODY>return(value)}
cal_power <- function(x){a <- x^2; b <- x^3out <- c(a, b)names(out) <- c("squared", "cubed")out # or return(out)}
cal_power(2)
squared cubed 4 8
👉 Functions are created using the function()
1. Function name
name <- function(arg1, aug2, ...){<FUNCTION BODY>return(value)}
cal_power <- function(x){a <- x^2b <- x^3out <- c(a, b)names(out) <- c("squared", "cubed")out # or return(out)}
Function name: cal_power
Function name: cal_power
use verbs, where possible
should be meaningful
use an underscore (_) to separate words
avoid names of built-in functions
start with lower case letters. Note that R is a case sensitive language
2. Function arguments/ inputs
name <- function(arg1, aug2, ...){<FUNCTION BODY>return(value)}
cal_power <- function(x){ a <- x^2 b <- x^3 out <- c(a, b) names(out) <- c("squared", "cubed") out # or return(out)}
Function arguments: x
Function name
Function arguments/ inputs
3. Function body
name <- function(arg1, aug2, ...){<FUNCTION BODY>return(value)}
cal_power <- function(x){ a <- x^2 b <- x^3 out <- c(a, b) names(out) <- c("squared", "cubed") out # or return(out)}
Function body
cal_sqrt <- function(x){x^2}
cal_sqrt <- function(x) x^2
Place spaces around all operators such as =, +, -, <-, etc.
Exception: Do not place spaces around the operators :, :: and :::
1+2 # bad1 + 2 # good
Place spaces around all operators such as =, +, -, <-, etc.
Exception: Do not place spaces around the operators :, :: and :::
1+2 # bad1 + 2 # good
if (a > 2) # goodif(a>2) # bad# Function call ----rnorm(2) # goodrnorm (2) # bad
# Bad ------a = sum(c(1, 5, 8, 10))/2sd = sd(c(1, 5, 8, 10))# Good ------a = sum(c(1, 5, 8, 10))/2sd = sd(c(1, 5, 8, 10))
# Good ---a[1, 2]a[1, ]if(x < 2)# Bad ---a[1,2]a[1,]if(x<2)if( x<2 )
# Good ---if(y == 2){print("even")}# Bad ---if(y == 2){ print("even")}
Load the mozzie dataset
library(mozzie)data(mozzie); head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle1 1 2008 52 15 7 1 11 4 0 02 2 2009 1 44 23 5 16 21 2 0 Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo1 6 22 0 0 8 0 0 12 5 18 1 0 0 0 0 0 Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla1 0 0 2 1 2 0 12 1 1 10 5 0 0 1 Monaragala Ratnapura Kegalle1 1 2 162 0 1 25
Load the mozzie dataset
library(mozzie)data(mozzie); head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle1 1 2008 52 15 7 1 11 4 0 02 2 2009 1 44 23 5 16 21 2 0 Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo1 6 22 0 0 8 0 0 12 5 18 1 0 0 0 0 0 Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla1 0 0 2 1 2 0 12 1 1 10 5 0 0 1 Monaragala Ratnapura Kegalle1 1 2 162 0 1 25
Use Min-Max transformation to rescale all the districts variables onto 0-1 range.
Min-Max transformation is xi−min(x)max(x)−min(x) where x=(x1,x2,...xn).
Min-Max transformation on mozzie
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
Min-Max transformation on mozzie
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) / (max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE)) head(minmax.gampaha) # Gampaha district
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
Min-Max transformation on mozzie
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))head(minmax.colombo) # Colombo district
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) / (max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE)) head(minmax.gampaha) # Gampaha district
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) / (max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE))head(minmax.kalutara) # Kalutara district
[1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667
You could easily make errors.
A mistake copied becomes a mistake repeated.
You could easily make errors.
A mistake copied becomes a mistake repeated.
Whenever you need to copy and paste a block of codes many times
If you don't find a suitable built-in function to serve your purpose, you can write your own function
To share your work with others
rescale_minmax
rescale_minmax
rescale_minmax <-
rescale_minmax
rescale_minmax <-
rescale_minmax <- function(x) # Arguments/inputs should be defined inside ()
rescale_minmax <- function(x){# Task# output}
Find all the inputs that correspond to a given function output?
# Colombo district(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
Find all the inputs that correspond to a given function output?
# Colombo district(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
Re-write the code with general names
x <- mozzie$Colombo(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
Find all the inputs that correspond to a given function output?
# Colombo district(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) / (max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
Re-write the code with general names
x <- mozzie$Colombo(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
Remove duplication/ Make your code efficient and readable
rng <- range(x, na.rm = TRUE)rng
[1] 0 475
rng <- range(x, na.rm = TRUE)(x - rng[1]) / (rng[2] - rng[1])
Type A
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}
Type B
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) out.rescaled <- (x - rng[1]) / (rng[2] - rng[1]) out.rescaled}
Type A
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}
Type B
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) out.rescaled <- (x - rng[1]) / (rng[2] - rng[1]) out.rescaled}
Type C
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) out.rescaled <- (x - rng[1]) / (rng[2] - rng[1]) return(out.rescaled)}
In this situation Type A is the best.
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}
rescale_minmax(c(1, 200, 250, 80, NA))
[1] 0.0000000 0.7991968 1.0000000 0.3172691 NA
minmax.colombo <- rescale_minmax(mozzie$Colombo)head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.colombo <- rescale_minmax(mozzie$Colombo)head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- rescale_minmax(mozzie$Gampaha)head(minmax.gampaha)
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
minmax.kalutara <- rescale_minmax(mozzie$Kalutara)head(minmax.kalutara)
[1] 0.01333333 0.06666667 0.14666667 0.16000000 0.25333333 0.13333333
new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN
new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN
rescale_minmax <- function(x){ rng <- range(x, na.rm = TRUE, finite=TRUE) (x - rng[1]) / (rng[2] - rng[1])}
new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)rescale_minmax(new.data.col)
[1] 0.77272727 1.00000000 0.65909091 0.43181818 0.00000000 0.02272727 Inf
Rewrite rescale_minmax
so that -Inf
is set to 0, and Inf
is mapped to 1.
04:00
R for Data Science - Exercise 19.2.1, Question 3
05:00
R for Data Science - Exercise 19.2.1, Question 4
10:00
Descriptive names for variables.
Comment your code.
Write your own function to calculate parameter estimates of simple linear regression model.
Help: ^β=(XTX)−1XTY
05:00
Write a function to calculate confidence intervals for mean. ¯x±tα/2,(n−1)s√(n)
10:00
cal_mean_ci <- function(x, conf){ len.x <- length(x) se <- sd(x) / sqrt(len.x) alpha <- 1-conf mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)}data <- c(165, 170, 175, 180, 185)cal_mean_ci(data, 0.95)
[1] 165.1838 184.8162
cal_mean_ci <- function(x, conf = 0.95){ len.x <- length(x) se <- sd(x) / sqrt(len.x) alpha <- 1-conf mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)}cal_mean_ci(data)
[1] 165.1838 184.8162
cal_mean_ci(data, 0.99)
[1] 158.7221 191.2779
Write a function to calculate the correlation coefficient
r=∑ni=1(xi−¯x)(yi−¯y)√∑ni=1(xi−¯x)2∑ni=1(yi−¯y)2
Do not use the function cor
inside your function.
08:00
Write a function to generate 100 random numbers from a normal distribution and plot the distribution of the random numbers. Your function should display the generated random numbers and the corresponding plot.
10:00
Write a function to compute z-score value of a A/L Mathematics student given the marks of the student. Assume
mean(Mathematics) = 60, sd(Mathematics) = 10,
mean(Chemistry) = 45, sd(Chemistry) = 20,
mean(Physics) = 55, sd(Physics) = 5.
05:00
in-class discussion using R Demo
Write a function to calculate the median.
help:
5%%2
[1] 1
4%%2
[1] 0
Note: Do not use the built-in function median
inside your function.
08:00
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |