class: center, middle, inverse, title-slide # STA 326 2.0 Programming and Data Analysis with R ## R Data Import and Export ### Dr Thiyanga Talagala ### Online distance learning/teaching materials during the COVID-19 outbreak. --- # Data import with readr ## R package `readr`: part of the core tidyverse. ```r library(tidyverse) ``` ``` ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ``` ``` ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ✓ tibble 3.1.2 ✓ dplyr 1.0.7 ✓ tidyr 1.1.3 ✓ stringr 1.4.0 ✓ readr 1.4.0 ✓ forcats 0.5.1 ``` ``` ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() ``` ## `readr` data import functions - `read_csv`: reads comma-delimited files. - `read_csv2`: reads semicolon-separated files - `read_tsv`: reads tab-delimited files --- # 🛠 Import data from a .csv file ## Syntax ```r datasetname <- read_csv("include_file_path") ``` When you run `read_csv`, it prints out the names and type of each column. .full-width[.content-box-yellow[Switch to R]] --- # If the file is saved inside the project folder .full-width[.content-box-green[Demo: Go to google classroom and watch the video]] # If the file is saved outside the project folder .full-width[.content-box-green[Demo: Go to google classroom and watch the video]] --- # 🛠 Importing csv file from a website ## Syntax ```r datasetname <- read_csv("include url here") ``` ## Example ```r url <- "" foodlabel <- read_csv(url) ``` ``` Warning: Missing column names filled in: 'X43' [43] ``` ``` Parsed with column specification: cols( .default = col_double() ) ``` ``` See spec(...) for full column specifications. ``` ```r head(foodlabel, 1) ``` ``` # A tibble: 1 x 80 Gender Age Education Employment Income Housesize children marital fshopper <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 22 5 4 3 5 2 0 0 # … with 71 more variables: mplanner <dbl>, place <dbl>, FA <dbl>, # Diabetes <dbl>, Metabolic cyndrents <dbl>, Other <dbl>, specific <dbl>, # job1 <dbl>, job2 <dbl>, Exercise <dbl>, Health <dbl>, taste <dbl>, # easy <dbl>, familiarity <dbl>, friends <dbl>, Useful <dbl>, Easiness <dbl>, # Sufficient <dbl>, Trusfulness <dbl>, Clear <dbl>, attractive pack <dbl>, # hc/nutriclaims <dbl>, graphical <dbl>, Free/prize <dbl>, source <dbl>, # netquan <dbl>, low in fat <dbl>, low in cho <dbl>, sodium <dbl>, # e labels <dbl>, place2 <dbl>, fa2 <dbl>, Health_1 <dbl>, X43 <dbl>, # f1 <dbl>, f2 <dbl>, f3 <dbl>, f4 <dbl>, f5 <dbl>, f6 <dbl>, f7 <dbl>, # f8 <dbl>, f9 <dbl>, f10 <dbl>, f11 <dbl>, f12 <dbl>, f13 <dbl>, f14 <dbl>, # f15 <dbl>, f16 <dbl>, f17 <dbl>, f18 <dbl>, i1 <dbl>, i2 <dbl>, i3 <dbl>, # i4 <dbl>, i5 <dbl>, i6 <dbl>, i7 <dbl>, i8 <dbl>, i9 <dbl>, i10 <dbl>, # i11 <dbl>, i12 <dbl>, i13 <dbl>, i14 <dbl>, i15 <dbl>, i16 <dbl>, # i17 <dbl>, i18 <dbl>, cluster <dbl> ``` --- # `read.csv` and `read_csv` * `read.csv` is in base R. * `read_csv` is in tidyverse. * `read.csv()` performs a similar job to `read_csv()`. * `read_csv()` works well with other parts of the tidyverse. * `read_csv()` is faster than `read.csv()`. * `read_csv()` will always read variables containing text as character variable. In contrast, the base R function `read.csv()` will, by default, convert any character variable to a factor. <!--This is often not what you want, and can be overridden by passing the option stringsAsFactors = FALSE to read.csv().--> --- # 🛠 Writing to a File - We can save tibble (or dataframe) to a csv file, using `write_csv()`. - `write_csv()` is in the `readr` package. ## Syntax ```r write_csv(name_of_the_data_set_you_want_to_save, "path_to_write_to") ``` ## Example ```r data(iris) # This will save inside your project folder write_csv(iris, "iris.csv") # This will save inside the data folder which is inside your project folder write_csv(iris, "data/iris.csv") ``` .full-width[.content-box-yellow[Switch to R]] .full-width[.content-box-green[Demo: Go to google classroom and watch the video]] --- # 🛠 Importing Excel .xlsx files ## Syntax ```r library(readxl) mydata <- read_xlsx("file_path") ``` .full-width[.content-box-yellow[Switch to R]] .full-width[.content-box-green[Demo: Go to google classroom and watch the video]] --- # Importing SAS, SPSS and STATA files ## SAS ```r read_sas("mtcars.sas7bdat") write_sas(mtcars, "mtcars.sas7bdat") ``` ## SPSS ```r read_sav("mtcars.sav") write_sav(mtcars, "mtcars.sav") ``` ## Stata ```r read_dta("mtcars.dta") write_dta(mtcars, "mtcars.dta") ``` --- # Importing other types of data - `feather`: for sharing with Python and other languages - `httr`: for web apis - `jsonlite`: for JSON - `rvest`: for web scraping - `xml2`: for XML .full-width[.content-box-blue[Working with feather, httr, jsonlite, rvest and xml2 is beyond the scope of the course.]] --- class: center, middle Slides available at: All rights reserved by [Thiyanga S. Talagala](