class: center, middle, inverse, title-slide # STA 326 2.0 Programming and Data Analysis with R ## 🌍 Data Import and Export ### ### Dr Thiyanga Talagala --- <style type="text/css"> .remark-slide-content { font-size: 35px; } </style> <style type="text/css"> h1, #TOC>ul>li { color: #66a61e; font-weight: bold; } h2, #TOC>ul>ul>li { color: #e7298a; #font-family: "Times"; font-weight: bold; } h3, #TOC>ul>ul>li { color: #7570b3; #font-family: "Times"; font-weight: bold; } </style> # Today's menu .pull-left[ - Data import - Data export ] .pull-right[ <center><img src="importcover.jpeg" height="500px"/></center> ] --- background-image: url(readr.png) background-size: 100px background-position: 98% 6% # Data Science Workflow: Import <img src="workflowdsreadr.png" width="100%" /> --- ## Data import with readr `readr`: part of the core tidyverse. ```r library(tidyverse) ``` -- ## `readr` data import functions - `read_csv`: reads comma-delimited files. - `read_csv2`: reads semicolon-separated files - `read_tsv`: reads tab-delimited files --- ## 🛠 Import data from a .csv file (local machine) ### Syntax ```r datasetname <- read_csv("include_file_path") ``` When you run `read_csv`, it prints out the names and type of each column. .full-width[.content-box-yellow[Switch to R]] --- ## If the file is saved inside the project folder: part 1 <iframe width="1080" height="600" src="https://www.youtube.com/embed/i-tshXv6lTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- ## If the file is saved outside the project folder: part 2 <iframe width="1080" height="600" src="https://www.youtube.com/embed/i-tshXv6lTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- ## 🛠 Importing csv file from a website ### Syntax ```r datasetname <- read_csv("include url here") ``` ### Example ```r url <- "https://thiyanga.netlify.app/project/datasets/foodlabel.csv" foodlabel <- read_csv(url) ``` ```r head(foodlabel, 1) ``` ``` # A tibble: 1 x 80 Gender Age Education Employment Income Housesize children marital fshopper <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 22 5 4 3 5 2 0 0 # … with 71 more variables: mplanner <dbl>, place <dbl>, FA <dbl>, # Diabetes <dbl>, Metabolic cyndrents <dbl>, Other <dbl>, specific <dbl>, # job1 <dbl>, job2 <dbl>, Exercise <dbl>, Health <dbl>, taste <dbl>, # easy <dbl>, familiarity <dbl>, friends <dbl>, Useful <dbl>, Easiness <dbl>, # Sufficient <dbl>, Trusfulness <dbl>, Clear <dbl>, attractive pack <dbl>, # hc/nutriclaims <dbl>, graphical <dbl>, Free/prize <dbl>, source <dbl>, # netquan <dbl>, low in fat <dbl>, low in cho <dbl>, sodium <dbl>, # e labels <dbl>, place2 <dbl>, fa2 <dbl>, Health_1 <dbl>, X43 <dbl>, # f1 <dbl>, f2 <dbl>, f3 <dbl>, f4 <dbl>, f5 <dbl>, f6 <dbl>, f7 <dbl>, # f8 <dbl>, f9 <dbl>, f10 <dbl>, f11 <dbl>, f12 <dbl>, f13 <dbl>, f14 <dbl>, # f15 <dbl>, f16 <dbl>, f17 <dbl>, f18 <dbl>, i1 <dbl>, i2 <dbl>, i3 <dbl>, # i4 <dbl>, i5 <dbl>, i6 <dbl>, i7 <dbl>, i8 <dbl>, i9 <dbl>, i10 <dbl>, # i11 <dbl>, i12 <dbl>, i13 <dbl>, i14 <dbl>, i15 <dbl>, i16 <dbl>, # i17 <dbl>, i18 <dbl>, cluster <dbl> ``` --- ## `read.csv` and `read_csv` * `read.csv` is in base R. * `read_csv` is in tidyverse. * `read.csv()` performs a similar job to `read_csv()`. * `read_csv()` works well with other parts of the tidyverse. * `read_csv()` is faster than `read.csv()`. * `read_csv()` will always read variables containing text as character variable. In contrast, the base R function `read.csv()` will, by default, convert any character variable to a factor. <!--This is often not what you want, and can be overridden by passing the option stringsAsFactors = FALSE to read.csv().--> --- ## 🛠 Writing data to a .csv file - We can save tibble (or dataframe) to a csv file, using `write_csv()`. - `write_csv()` is in the `readr` package. --- ## Syntax ```r write_csv(name_of_the_data_set_you_want_to_save, "path_to_write_to") ``` ## Example ```r data(iris) # This will save inside your project folder write_csv(iris, "iris.csv") # This will save inside the data folder which is inside your project folder write_csv(iris, "data/iris.csv") ``` Swtich to R --- <iframe width="1000" height="600" src="https://www.youtube.com/embed/3pW3wZ-Dprg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- # 🛠 Importing data from .xlsx files ## Syntax ```r library(readxl) mydata <- read_xlsx("file_path") ``` .full-width[.content-box-yellow[Switch to R]] --- <iframe width="1000" height="600" src="https://www.youtube.com/embed/K58J7EvGXDA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- # Importing SAS, SPSS and STATA files ## SAS ```r read_sas("mtcars.sas7bdat") write_sas(mtcars, "mtcars.sas7bdat") ``` ## SPSS ```r read_sav("mtcars.sav") write_sav(mtcars, "mtcars.sav") ``` ## Stata ```r read_dta("mtcars.dta") write_dta(mtcars, "mtcars.dta") ``` --- # Importing other types of data - `feather`: for sharing with Python and other languages - `httr`: for web apis - `jsonlite`: for JSON - `rvest`: for web scraping - `xml2`: for XML .full-width[.content-box-blue[Working with feather, httr, jsonlite, rvest and xml2 is beyond the scope of the course.]] --- class: center, middle Slides available at: hellor.netlify.app All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.com/)