How to call a built-in function
Arguments matching
Basic functions
Test and type conversion functions
Probability distribution functions
Reproducibility of scientific results
Data visualization: qplot
👉🏻 Perform a specific task according to a set of instructions.
👉🏻 Perform a specific task according to a set of instructions.
👉🏻 Some functions we have discussed so far,
c
,matrix
,array
,list
,data.frame
,str
,dim
,length
,nrow
,which.max
,diag
,summary
👉🏻 Perform a specific task according to a set of instructions.
👉🏻 Some functions we have discussed so far,
c
,matrix
,array
,list
,data.frame
,str
,dim
,length
,nrow
,which.max
,diag
,summary
👉🏻 In R, functions are objects of class function.
class(length)
[1] "function"
👉🏻 There are basically two types of functions:
💻 Built-in functions
Already created or defined in the programming framework to make our work easier.
👨 User-defined functions
Sometimes we need to create our own functions for a specific purpose.
function_name(arg1 = 1, arg2 = 3)
Argument matching
The following calls to mean
are all equivalent
mydata <- c(rnorm(20), 100000)mean(mydata) # matched by positionmean(x = mydata) # matched by namemean(mydata, na.rm = FALSE)mean(x = mydata, na.rm = FALSE) mean(na.rm = FALSE, x = mydata) mean(na.rm = FALSE, mydata)
[1] 4761.661
⚠️ Even though it works, do not change the order of the arguments too much.
mean(mydata, trim=0)
[1] 4761.661
mean(mydata) # Default value for trim is 0
[1] 4761.661
mean(mydata, trim=0.1)
[1] -0.1271709
mean(mydata, tr=0.1) # Partial Matching
[1] -0.1271709
Operator | Description |
---|---|
abs(x) | absolute value of x |
log(x, base = y) | logarithm of x with base y; if base is not specified, returns the natural logarithm |
exp(x) | exponential of x |
sqrt(x) | square root of x |
factorial(x) | factorial of x |
Operator | Description |
---|---|
mean(x) | mean of x |
median(x) | median of x |
mode(x) | mode of x |
var(x) | variance of x |
sd(x) | standard deviation of x |
scale(x) | z-score of x |
quantile(x) | quantiles of x |
summary(x) | summary of x: mean, minimum, maximum, etc. |
Test | Convert |
---|---|
is.numeric() | as.numeric() |
is.character() | as.character() |
is.vector() | as.vector() |
is.matrix() | as.matrix() |
is.data.frame() | as.data.frame() |
is.factor() | as.factor() |
is.logical() | as.logical() |
is.na() |
Test | Convert |
---|---|
is.numeric() | as.numeric() |
is.character() | as.character() |
is.vector() | as.vector() |
is.matrix() | as.matrix() |
is.data.frame() | as.data.frame() |
is.factor() | as.factor() |
is.logical() | as.logical() |
is.na() |
a <- c(1, 2, 3); a
[1] 1 2 3
is.numeric(a)
[1] TRUE
is.vector(a)
[1] TRUE
b <- as.character(a); b
[1] "1" "2" "3"
is.vector(b)
[1] TRUE
is.character(b)
[1] TRUE
Remove missing values in the following vector
a
[1] 0.61940020 -0.93808729 0.95518590 -0.22663938 0.29591186 NA [7] 0.36788089 0.71791098 0.71202022 0.22765782 NA NA[13] -0.74024324 0.02081516 -0.14979979 -0.22351308 0.98729725 NA[19] NA NA NA NA NA NA[25] NA NA NA -1.50016003 0.18682734 0.20808590[31] 0.70102264 -0.10633074 -1.18460046 0.06475501 0.11568817 -0.04333140[37] -0.22020064 0.02764713 0.10165760 -0.18234246 1.32914659 -1.29704248[43] 1.05317749 -0.70109051 0.09798707 0.10457263 -0.21449845
Each probability distribution in R is associated with four functions.
Naming convention for the four functions:
For each function there is a root name. For example, the root name for the normal distribution is norm
. This root is prefixed by one of the letters d
, p
, q
, r
.
d prefix for the distribution function
p prefix for the cumulative probability
q prefix for the quantile
r prefix for the random number generator
Example: dnorm
, pnorm
, qnorm
, rnorm
The general formula for the probability density function of the normal distribution with mean μ and variance σ is given by
fX(x)=1σ√(2π)e−(x−μ)2/2σ2
If we let the mean μ=0 and the standard deviation σ=1, we get the probability density function for the standard normal distribution.
fX(x)=1√(2π)e−(x)2/2
fX(x)=1√(2π)e−(x)2/2
dnorm(0)
[1] 0.3989423
Standard normal probability density function: dnorm(0)
fX(x)=1√(2π)e−(x)2/2
pnorm(0)
[1] 0.5
Standard normal probability density function: pnorm(0)
fX(x)=1√(2π)e−(x)2/2
qnorm(0.5)
[1] 0
Standard normal probability density function: qnorm(0.5)
norm
pnorm(3)
[1] 0.9986501
pnorm(3, sd=1, mean=0)
[1] 0.9986501
pnorm(3, sd=2, mean=1)
[1] 0.8413447
dbinom(2, size=10, prob=0.2)
[1] 0.3019899
a <- dbinom(0:10, size=10, prob=0.2)a
[1] 0.1073741824 0.2684354560 0.3019898880 0.2013265920 0.0880803840 [6] 0.0264241152 0.0055050240 0.0007864320 0.0000737280 0.0000040960[11] 0.0000001024
cumsum(a)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356 [8] 0.9999221 0.9999958 0.9999999 1.0000000
cumsum(a)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356 [8] 0.9999221 0.9999958 0.9999999 1.0000000
pbinom(0:10, size=10, prob=0.2)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356 [8] 0.9999221 0.9999958 0.9999999 1.0000000
qbinom(0.4, size=10, prob=0.2)
[1] 2
set.seed(262020)random_numbers <- rnorm(5)random_numbers
[1] 0.2007818 0.9587335 1.1836906 1.4951375 1.1810922
sort(random_numbers) ## sort the numbers then it is easy to map with the graph
[1] 0.2007818 0.9587335 1.1810922 1.1836906 1.4951375
beta
: beta distribution
binom
: binomial distribution
cauchy
: Cauchy distribution
chisq
: chi-squared distribution
exp
: exponential distribution
f
: F distribution
gamma
: gamma distribution
geom
: geometric distribution
hyper
: hyper-geometric distribution
lnorm
: log-normal distribution
multinom
: multinomial distribution
nbinom
: negative binomial distribution
norm
: normal distribution
pois
: Poisson distribution
t
: Student's t distribution
unif
: uniform distribution
weibull
: Weibull distribution
🙋 Getting help with R: ?Distributions
Q1 Suppose Z∼N(0,1). Calculate the following standard normal probabilities.
P(Z≤1.25),
P(Z>1.25),
P(Z≤−1.25),
P(−.38≤Z≤1.25).
Q2 Find the following percentiles for the standard normal distribution.
90th,
95th,
97.5th,
Q3 Determine the Zα for the following
α=0.1
α=0.95
Q4 Suppose X∼N(15,9). Calculate the following probabilities
P(X≤15),
P(X<15),
P(X≥10).
02:00
Q5 A particular mobile phone number is used to receive both voice messages and text messages. Suppose 20% of the messages involve text messages, and consider a sample of 15 messages. What is the probability that
At most 8 of the messages involve a text message?
Exactly 8 of the messages involve a text message.
02:00
Q6 Generate 20 random values from a Poisson distribution with mean 10 and calculate the mean. Compare your answer with others.
02:00
rnorm(10) # first attempt
[1] 1.6582609 -1.8912734 -2.8471112 -2.1617741 0.6401224 -0.4295948 [7] -0.3122580 -1.0267992 1.4231150 0.8661058
rnorm(10) # second attempt
[1] -0.91879540 -0.06053766 -0.20263170 -0.26301690 0.97964620 -0.46034817 [7] 0.81826880 -0.60935778 1.71086661 0.49294451
As you can see above you will get different results.
set.seed(1)rnorm(10) # First attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684 [7] 0.4874291 0.7383247 0.5757814 -0.3053884
set.seed(1)rnorm(10) # Second attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684 [7] 0.4874291 0.7383247 0.5757814 -0.3053884
apply()
functionmarks <- data.frame(maths=c(10, 20, 30), chemistry=c(100, NA, 60)); marks
maths chemistry1 10 1002 20 NA3 30 60
apply(marks, 1, mean)
[1] 55 NA 45
apply(marks, 2, mean)
maths chemistry 20 NA
apply()
functionmarks <- data.frame(maths=c(10, 20, 30), chemistry=c(100, NA, 60)); marks
maths chemistry1 10 1002 20 NA3 30 60
apply(marks, 1, mean)
[1] 55 NA 45
apply(marks, 2, mean)
maths chemistry 20 NA
apply(marks, 1, mean, na.rm=TRUE)
[1] 55 20 45
Calculate the row and column wise standard deviation of the following matrix
[,1] [,2] [,3] [,4][1,] 1 6 11 16[2,] 2 7 12 17[3,] 3 8 13 18[4,] 4 9 14 19[5,] 5 10 15 20
03:00
Your turn
Find about the following variants of apply family functions in R lapply()
, sapply()
, vapply()
, mapply()
, rapply()
, and tapply()
functions.
Resourses: You can follow the DataCamp tutorial here.
You should clearly explain,
syntax for each function
function inputs
how each function works?/ The task of the function.
output of the function.
differences between the functions (apply vs lapply, apply vs sapply, etc.)
Provide your own example for each function.
?qplot
?qplot
install.packages("ggplot2")
library(ggplot2)
Now search ?qplot
Note: You shouldn't have to re-install packages each time you open R. However, you do need to load the packages you want to use in that session via library
.
library(mozzie)data(mozzie)
boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders", ylab = "Miles Per Gallon", main = "Boxplot Example", notch = TRUE, varwidth = TRUE, col = c("green","yellow","red"), names = c("High","Medium","Low") )
counts <- table(mtcars$gear)barplot(counts, main="Car Distribution", xlab="Number of Gears")
Default R installation: graphics package
[1] "abline" "arrows" "assocplot" "axis" [5] "Axis" "axis.Date" "axis.POSIXct" "axTicks" [9] "barplot" "barplot.default" "box" "boxplot" [13] "boxplot.default" "boxplot.matrix" "bxp" "cdplot" [17] "clip" "close.screen" "co.intervals" "contour" [21] "contour.default" "coplot" "curve" "dotchart" [25] "erase.screen" "filled.contour" "fourfoldplot" "frame" [29] "grconvertX" "grconvertY" "grid" "hist" [33] "hist.default" "identify" "image" "image.default" [37] "layout" "layout.show" "lcm" "legend" [41] "lines" "lines.default" "locator" "matlines" [45] "matplot" "matpoints" "mosaicplot" "mtext" [49] "pairs" "pairs.default" "panel.smooth" "par" [53] "persp" "pie" "plot" "plot.default" [57] "plot.design" "plot.function" "plot.new" "plot.window" [61] "plot.xy" "points" "points.default" "polygon" [65] "polypath" "rasterImage" "rect" "rug" [69] "screen" "segments" "smoothScatter" "spineplot" [73] "split.screen" "stars" "stem" "strheight" [77] "stripchart" "strwidth" "sunflowerplot" "symbols" [81] "text" "text.default" "title" "xinch" [85] "xspline" "xyinch" "yinch"
head(mozzie)
# A tibble: 6 x 28 ID Year Week Colombo Gampaha Kalutara Kandy Matale `Nuwara Eliya` Galle <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>1 1 2008 52 15 7 1 11 4 0 02 2 2009 1 44 23 5 16 21 2 03 3 2009 2 39 19 11 42 9 1 24 4 2009 3 57 23 12 28 3 2 15 5 2009 4 53 24 19 32 20 2 26 6 2009 5 29 17 10 21 6 0 3# … with 18 more variables: Hambantota <int>, Matara <int>, Jaffna <int>,# Kilinochchi <int>, Mannar <int>, Vavuniya <int>, Mulative <int>,# Batticalo <int>, Ampara <int>, Trincomalee <int>, Kurunagala <int>,# Puttalam <int>, Anuradhapura <int>, Polonnaruwa <int>, Badulla <int>,# Monaragala <int>, Ratnapura <int>, Kegalle <int>
qplot
plot(mozzie$Colombo, mozzie$Gampaha)
qplot(Colombo, Gampaha, data=mozzie)
qplot
qplot(Colombo, Gampaha, data=mozzie)
qplot(Colombo, Gampaha, data=mozzie, colour=Year)
qplot
qplot(Colombo, Gampaha, data=mozzie)
qplot(Colombo, Gampaha, data=mozzie, size=Year)
qplot
qplot(Colombo, Gampaha, data=mozzie)
qplot(Colombo, Gampaha, data=mozzie, size=Year, alpha=0.5)
qplot
qplot(Colombo, Gampaha, data=mozzie)
qplot(Colombo, Gampaha, data=mozzie, geom="point")
qplot
qplot(ID, Gampaha, data=mozzie)
qplot(ID, Gampaha, data=mozzie, geom="line")
qplot
qplot(ID, Gampaha, data=mozzie)
qplot(ID, Gampaha, data=mozzie, geom="path")
qplot
qplot(Colombo, Gampaha, data=mozzie, geom="line")
qplot(Colombo, Gampaha, data=mozzie, geom="path")
qplot
qplot(Colombo, Gampaha, data=mozzie, geom=c("line", "point"))
qplot(Colombo, Gampaha, data=mozzie, geom=c("path", "point"))
qplot
boxplot(Colombo~Year, data=mozzie)
qplot(factor(Year), Colombo, data=mozzie, geom="boxplot")
qplot
qplot(factor(Year), Colombo, data=mozzie, geom="boxplot")
qplot(factor(Year), Colombo, data=mozzie) # geom="point"-default
qplot
qplot(factor(Year), Colombo, data=mozzie, geom="point")
qplot(factor(Year), Colombo, data=mozzie, geom="jitter") # geom="point"-default
qplot
qplot(factor(Year), Colombo, data=mozzie, geom="jitter")
qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "boxplot")) # geom="point"-default
qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "boxplot")) # geom="point"-default
qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "boxplot")) # geom="point"-default
qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "boxplot"), outlier.shape = NA) # geom="point"-default
qplot
qplot(Colombo, data=mozzie)
qplot(Colombo, data=mozzie, geom="density")
Explore iris
dataset with suitable graphics.
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa
class: center, middle
Slides available at: hellor.netlify.app
All rights reserved by Thiyanga S. Talagala
How to call a built-in function
Arguments matching
Basic functions
Test and type conversion functions
Probability distribution functions
Reproducibility of scientific results
Data visualization: qplot
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |