class: center, middle, inverse, title-slide .title[ # STA 326 2.0 Programming and Data Analysis with R ] .subtitle[ ## 🎉 Introduction to R, RStudio and R Programming Basics ] .author[ ### ] .author[ ### Dr Thiyanga Talagala ] --- <style type="text/css"> .remark-slide-content { font-size: 25px; } </style> # Today's menu .pull-left[ - What is R? - Why R? - R and Rstudio. - Installing R and Rstudio? - Familiarize with RStudio interface. - R Studio cloud. - Using R as a calculator. - Basic vector operations. ] .pull-right[ <center><img src="salad.jpeg" height="500px"/></center> ] --- ## What statistical software packages are you familiar with?
02
:
30
--- ### R Programming Language - R is a software environment for statistical computing and graphics. - Language designers: **R**oss Ihaka and **R**obert Gentleman at the University of Auckland, New Zealand. - Parent language: S - R is a **functional programming language**. However, R system includes some support for object-oriented programming. <center><img src="RobertandRoss.jpg" height="400px" /></center> --- <!--You have already mastered Minitab. --> .pull-left[ ### Minitab - Menu driven software <center><img src="minitab.png" height="400px" height="300px"/></center> <!--Minitab. Commonly used in: social science, marketing, education, sociology, ... Menu-driven statistical software--> ] .pull-right[ ### R - Scripting language <center><img src="rstudiocode.png" height="400px" height="300px"/></center> ] --- .pull-left[ ### Menu driven software <center><img src="minitab.png" height="400px" height="300px"/></center> <!--Minitab. Commonly used in: social science, marketing, education, sociology, ... Menu-driven statistical software--> ] .pull-right[ ## Pros - Do not have to remember the commands. - User friendly. ## Cons - Irritating if there are too many levels of menues to move around. - Difficult to reproduce results. ] --- <!--You have already mastered Minitab. --> .pull-left[ ### Scripting language <center><img src="rstudiocode.png" height="400px" height="300px"/></center> ] .pull-right[ ## Pros - Useful for collaborative research. - Ideal for reproducible research ## Cons - The learning curve may be difficult at the start. ] --- ## Why learn R? - Free and open-source software package - A large online community that makes it fun to learn - Latest cutting edge technology - Easier to update analysis - Easier to reproduce analysis - Easier to collaborate with others - Easier to automate analysis --- background-image: url('ricon.png') background-position: right background-size: contain ## R --- background-image: url('rmac2.png') background-position: right background-size: contain ## R environment - macOS --- background-image: url('rmac1.png') background-position: right background-size: contain ## R environment - macOS --- background-image: url('renv.png') background-position: right background-size: contain ## R environment --- background-image: url('rstudio1.png') background-position: right background-size: contain ## The RStudio IDE --- background-image: url('rstudio2.png') background-position: right background-size: contain ## The RStudio IDE --- background-image: url('airport.jpg') background-position: right background-size: contain ## R and RStudio --- ## R and RStudio .pull-left[ ![](airport.jpg) ] .pull-right[ > If R were **an airplane**, RStudio would be **the airport**, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer." > Julie Lowndes ] --- class: inverse, center, middle # Create a new project --- background-image: url('project1.png') background-position: center background-size: contain --- background-image: url('project2.png') background-position: center background-size: contain --- background-image: url('project3.png') background-position: center background-size: contain --- background-image: url('project4.png') background-position: center background-size: contain --- background-image: url('project5.png') background-position: center background-size: contain --- background-image: url('project6.png') background-position: center background-size: contain --- ## Get Working Directory Display the current working directory. ```r getwd() ``` ## Set Working Directory ```r setwd(include_path_here) ``` --- class: inverse, center, middle # Basics of R programming --- ## R Console ```r 7+1 ``` ``` [1] 8 ``` ```r rnorm(10) ``` ``` [1] -0.52734986 -1.00880796 -0.51885818 -1.25822706 -2.03810299 0.86445152 [7] -0.32695804 -0.27729466 -0.08115242 2.29875891 ``` --- ## Variable assignment ```r a <- rnorm(10) a ``` ``` [1] -0.17080946 -1.62197463 -0.19588570 -0.49785571 -0.01480748 1.49439259 [7] 1.11982158 -0.16474605 0.44264127 -0.18153232 ``` -- ```r b <- a*100 b ``` ``` [1] -17.080946 -162.197463 -19.588570 -49.785571 -1.480748 149.439259 [7] 111.982158 -16.474605 44.264127 -18.153232 ``` -- ```r c <- "corona" c ``` ``` [1] "corona" ``` --- ## Case sensitivity R is case sensitive. The following are all different. ```r corona Corona CORONA ``` --- # Working directory To check your current working directory ```r getwd() ``` To change the working directory ```r setwd("/Users/thiyanga/sta326") ``` --- # Data permanency - All variables are kept in the workspace. - `ls()` can be used to display the names of the objects which are currently stored within R. - The collection of objects currently stored is called the **workspace**. ```r ls() ``` ``` [1] "a" "b" "c" ``` --- ## Remove objects - To remove objects the function `rm` is available ### Remove specific objects: `rm(x, y, z)` ```r rm(a) ls() ``` ``` [1] "b" "c" ``` ### Remove all objects: `rm(list=ls())` ```r rm(list=ls()) ls() ``` ``` character(0) ``` --- class: inverse, center, middle # Close the project --- background-image: url('project7.png') background-position: center background-size: contain --- ![](project7.png) At the end of an R session, if you click **save**: the objects are written to a file called **.RData** in the current directory, and the command lines used in the session are saved to a file called **.Rhistory** --- class: inverse, center, middle # When R is started at later time **from the same directory**. --- background-image: url('p81.png') background-position: center background-size: cover --- background-image: url('p82.png') background-position: center background-size: cover .pull-left[.full-width[.content-box-yellow[**When R is started at later `time from the same directory` it reloads the `associated workspace` and `commands history.`**]]] --- background-image: url('project9.png') background-position: center background-size: cover -- .pull-left[.full-width[.content-box-yellow[**When R is started at later `time from the same directory` it reloads the `associated workspace` and `commands history.`**]]] --- ## Comment your code - Each line of a comment should begin with the comment symbol and a single space: # . ```r rnorm(10) # This is a comment ``` ``` [1] 1.1100805 -0.2128508 -0.3968625 1.4603542 1.4615554 0.2802397 [7] 0.7230058 -0.4670261 -0.0622365 0.9424283 ``` ```r sum(1:10) # Summation of numbers from 1 to 10. ``` ``` [1] 55 ``` --- ## Style Guide - Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. -- Hadley Wickham ```r sum(1:10)#Bad commenting style ``` ``` [1] 55 ``` ```r sum(1:10) # Good commenting style ``` ``` [1] 55 ``` - Also, use commented lines of - and = to break up your file into easily readable sub-sections. ```r # Read data ---------------- # Plot data ---------------- ``` To learn more read Hadley Wickham's [Style guide](http://adv-r.had.co.nz/Style.html). --- ## Objects in R 1. .red[Data structures] are the ways of arranging data. - You can create objects, using the left pointing arrow <- -- 1. .red[Functions] tell R to do something. - A function may be applied to an object. - Result of applying a function is usually an object too. - All function calls need to be followed by parentheses. ```r a <- 1:20 # data structure sum(a) # sum is a function applied on a ``` ``` [1] 210 ``` ```r help.start() # Some functions work on their own. ``` --- ### Getting help with functions and features - R has inbuilt help facility **Method 1** ```r help(rnorm) ``` - For a feature specified by special characters such as `for`, `if`, `[[` ```r help("[[") ``` - Search the help files for a word or phrase. ```r help.search(‘weighted mean’) ``` **Method 2** ```r ?rnorm ``` ```r ??rnorm ``` --- background-image: url('dataStructures.png') background-position: center background-size: contain ## Data structures .footer-note[.tiny[.green[Image Credit: ][venus.ifca.unican.es](http://venus.ifca.unican.es/Rintro/dataStruct.html)]] --- ## Data structures .pull-left[ ![](dataStructures.png) ] .pull-right[ Data structures differ in terms of, - Type of data they can hold - How they are created - Structural complexity - Notation to identify and access individual elements ] .footer-note[.tiny[.green[Image Credit: ][venus.ifca.unican.es](http://venus.ifca.unican.es/Rintro/dataStruct.html)]] --- class: inverse, center, middle # 1. Vectors --- ### Vectors - Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. - The function `c` is used to form vectors. `c` stands for concatenate. - Data in a vector must only be one type or mode (numeric, character, or logical). You can’t mix modes in the same vector. **Vector assignment** ```r vector_name <- c(element1, element2, element3) # syntax ``` ```r x <- c(5, 6, 3, 1 , 100) # example ``` - assignment operator ('<-'), '=' can be used as an alternative. - `c()` function is used to create vector. --- ## Your turn .red[What will be the output of the following code?] ```r x <- c(5, 6, 3, 1 , 100) x y <- c(x, 500, 600) y ```
01
:
30
--- # Types and tests with vectors ```r first_vec <- c(10, 20, 50, 70) second_vec <- c("Jan", "Feb", "March", "April") third_vec <- c(TRUE, FALSE, TRUE, TRUE) fourth_vec <- c(10L, 20L, 50L, 70L) ``` To check if it is a - vector: `is.vector()` ```r is.vector(first_vec) ``` ``` [1] TRUE ``` - character vector: `is.character()` ```r is.character(first_vec) ``` ``` [1] FALSE ``` --- .pull-left[ - double: `is.double()` ```r is.double(first_vec) ``` ``` [1] TRUE ``` - integer: `is.integer()` ```r is.integer(first_vec) ``` ``` [1] FALSE ``` - logical: `is.logical()` ```r is.logical(first_vec) ``` ``` [1] FALSE ``` ] .pull-right[ - length ```r length(first_vec) ``` ``` [1] 4 ``` > **Compare `first_vec <- c(10, 20, 50, 70)` and `fourth_vec <- c(10L, 20L, 50L, 70L)`** ```r is.double(fourth_vec) ``` ``` [1] FALSE ``` ```r is.integer(fourth_vec) ``` ``` [1] TRUE ``` ] --- ## Mathematical operations ```r sum(first_vec) ``` ``` [1] 150 ``` ```r mean(first_vec) ``` ``` [1] 37.5 ``` ```r summary(first_vec) ``` ``` Min. 1st Qu. Median Mean 3rd Qu. Max. 10.0 17.5 35.0 37.5 55.0 70.0 ``` **More about `functions`: week 3.** --- # Coercion Vectors must be homogeneous. When you attempt to combine different types they will be coerced to the most flexible type so that every element in the vector is of the same type. Order from least to most flexible `logical` --> `integer` --> `double` --> `character` ```r a <- c(3.1, 2L, 3, 4, "GPA") typeof(a) ``` ``` [1] "character" ``` ```r anew <- c(3.1, 2L, 3, 4) typeof(anew) ``` ``` [1] "double" ``` --- # Explicit coercion Vectors can be explicitly coerced from one class to another using the `as.*` functions, if available. For example, `as.character`, `as.numeric`, `as.integer`, and `as.logical`. ```r vec1 <- c(TRUE, FALSE, TRUE, TRUE) typeof(vec1) ``` ``` [1] "logical" ``` ```r vec2 <- as.integer(vec1) typeof(vec2) ``` ``` [1] "integer" ``` ```r vec2 ``` ``` [1] 1 0 1 1 ``` --- # Your turn .red[Why does the below output NAs?] ```r x <- c("a", "b", "c") as.numeric(x) ``` ``` [1] NA NA NA ```
02
:
00
--- # Explicit coercion (cont.) ```r x1 <- 1:3 x2 <- c(10, 20, 30) combinedx1x2 <- c(x1, x2) combinedx1x2 ``` ``` [1] 1 2 3 10 20 30 ``` ```r typeof(x1) ``` ``` [1] "integer" ``` ```r typeof(x2) ``` ``` [1] "double" ``` ```r typeof(combinedx1x2) ``` ``` [1] "double" ``` --- # Explicit coercion (cont.) .pull-left[ ```r x1 <- 1:3 x2 <- c(10, 20, 30) combinedx1x2 <- c(x1, x2) combinedx1x2 ``` ``` [1] 1 2 3 10 20 30 ``` ```r class(x1) ``` ``` [1] "integer" ``` ```r class(x2) ``` ``` [1] "numeric" ``` ```r class(combinedx1x2) ``` ``` [1] "numeric" ``` ] .pull-right[ - If you combine a numeric vector and a character vector ```r y1 <- c(1, 2, 3) y2 <- c("a", "b", "c") c(y1, y2) ``` ``` [1] "1" "2" "3" "a" "b" "c" ``` ] --- # Name elements in a vector You can name elements in a vector in different ways. We will learn two of them. 1. When creating it ```r x1 <- c(a=1991, b=1992, c=1993) x1 ``` ``` a b c 1991 1992 1993 ``` 2. Modifying the names of an existing vector ```r x2 <- c(1, 5, 10) names(x2) <- c("a", "b", "b") x2 ``` ``` a b b 1 5 10 ``` Note that the names do not have to be unique. --- # To remove names of a vector Method 1 ```r unname(x1); x1 ``` ``` [1] 1991 1992 1993 ``` ``` a b c 1991 1992 1993 ``` Method 2 ```r names(x2) <- NULL; x2 ``` ``` [1] 1 5 10 ``` --- # Your turn .red[What will be the output of the following code?] ```r v <- c(1, 2, 3) names(v) <- c("a") v ```
01
:
30
--- # Simplifying vector creation: `:` 1. colon `:` produce regular spaced ascending or descending sequences. .pull-left[ ```r 10:16 ``` ``` [1] 10 11 12 13 14 15 16 ``` ```r -0.5:7.5 ``` ``` [1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 ``` ```r 7.5: -0.5 ``` ``` [1] 7.5 6.5 5.5 4.5 3.5 2.5 1.5 0.5 -0.5 ``` ```r -0.5:7.3 ``` ``` [1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 ``` ] .pull-right[ ```r class(10:16) ``` ``` [1] "integer" ``` ```r class(-0.5:7.5) ``` ``` [1] "numeric" ``` ```r class(7.5:-0.5) ``` ``` [1] "numeric" ``` ```r class(-0.5:7.3) ``` ``` [1] "numeric" ``` ] --- # Simplifying vector creation: `seq` 2. sequence: `seq(initial_value, final_value, increment)` ```r seq(0.5, 8) ``` ``` [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 ``` ```r seq(1,11) ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 11 ``` ```r seq(1, 11, length.out=5) ``` ``` [1] 1.0 3.5 6.0 8.5 11.0 ``` ```r seq(0, 11, by=2) ``` ``` [1] 0 2 4 6 8 10 ``` --- # Simplifying vector creation: `rep` 3. repeats `rep()` ```r rep(9, 5) ``` ``` [1] 9 9 9 9 9 ``` ```r rep(1:4, 2) ``` ``` [1] 1 2 3 4 1 2 3 4 ``` ```r rep(1:4, each=2) # each element is repeated twice ``` ``` [1] 1 1 2 2 3 3 4 4 ``` ```r rep(1:4, times=2) # whole sequence is repeated twice ``` ``` [1] 1 2 3 4 1 2 3 4 ``` --- # Simplifying vector creation: `rep` ```r virus <- rep(c("delta", "gamma"), times=3) virus ``` ``` [1] "delta" "gamma" "delta" "gamma" "delta" "gamma" ``` ```r virus <- rep(c("delta", "gamma"), times=3, length.out=5) virus ``` ``` [1] "delta" "gamma" "delta" "gamma" "delta" ``` --- # Simplifying vector creation: `rep` (cont.) **Write the output of the following codes.** Your turn: ```r rep(1:4, each=2, times=3) rep(1:4, 1:4) rep(1:4, c(4, 1, 4, 2)) ```
05
:
00
--- # Logical operators .pull-left[ - `<=` less than or equal to - `>=` greater than or equal to - `|` or - `&` and - `<` less than - `>` greater than - `==` equal ] .pull-right[ ```r c(1, 2, 3) == c(10, 20, 3) ``` ``` [1] FALSE FALSE TRUE ``` ```r c(1, 2, 3) != c(10, 20, 3) ``` ``` [1] TRUE TRUE FALSE ``` ```r 1:5 > 3 ``` ``` [1] FALSE FALSE FALSE TRUE TRUE ``` ```r 1:5 < 3 ``` ``` [1] TRUE TRUE FALSE FALSE FALSE ``` ] --- ## Operators: `%in%` - in the set ```r a <- c(1, 2, 3) b <- c(1, 10, 3) a %in% b ``` ``` [1] TRUE FALSE TRUE ``` ```r x <- 1:10; y <- 1:3 x; y ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 ``` ``` [1] 1 2 3 ``` ```r x %in% y ``` ``` [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` ```r y %in% x ``` ``` [1] TRUE TRUE TRUE ``` --- ### Vector arithmetic - operations are performed element by element. ```r c(10, 100, 100) + 2 # two is added to every element in the vector ``` ``` [1] 12 102 102 ``` -- - operations between two vectors ```r v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000) v1 + v2 ``` ``` [1] 11 102 1003 ``` -- - Add two vectors of unequal length (length of the longer is an integer multiple of the length of the shorter vector) ```r longvec <- seq(10, 100, length=10); shortvec <- c(1, 2, 3, 4, 5) longvec + shortvec ``` ``` [1] 11 22 33 44 55 61 72 83 94 105 ``` --- - Add two vectors of unequal length (length of the longer is not an integer multiple of the length of the shorter vector) ```r # gives a warning message when the length of the longer is not an integer multiple of the length of the shorter vector. svec <- c(1, 2, 3) longvec + svec ``` ``` [1] 11 22 33 41 52 63 71 82 93 101 ``` ![](console.png) --- # Your turn .red[What will be the output of the following code?] ```r first <- c(1, 2, 3, 4); second <- c(10, 100) first * second ```
02
:
30
--- # Other vector operations - Please see the [cheatsheet](/pdf/baser.pdf). --- # Missing values Use `NA` or `NaN` to place a missing value in a vector. ```r z <- c(10, 101, 2, 3, NA) is.na(z) ``` ``` [1] FALSE FALSE FALSE FALSE TRUE ``` ```r y <- c(10, 101, 2, 3, NaN) is.na(y) ``` ``` [1] FALSE FALSE FALSE FALSE TRUE ``` --- class: center, middle ## Thank you! Slides available at: hellor.netlify.app All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.com/)