1. Data

Download the data file from the course website here (under week 12).

Use readRDS function to read the file.


facebookdata_marketing <- readRDS("_GIVE_FILE PATH_/facebookdata_marketing.rds")

Your turn (optional)

Explore here function in the here package.

Read here

2. Variable description

A manager of a retail company wants to develop a regression model to identify the effect of the following variables (see below) on the total number of likes, comments, and shares on facebook posts.

Dependent variable:

3. Training test and Test set

smp_size <- 400

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(facebookdata_marketing)), size = smp_size)
train <- facebookdata_marketing[train_ind, ]
test <- facebookdata_marketing[-train_ind, ]

4. Questions

  1. Perform a thorough Exploratory Data Analysis on facebookdata_marketing.rds.

  2. Develop a suitable regression model to predict total interactions (The sum of “likes,” “comments,” and “shares” of the post).

  3. Test for significance of regression. What conclusions can you draw?

  4. Using t tests, determine the contribution of the regressors in your final model. Discuss your findings.

  5. Plot 95% confidence interval for the regression coefficients of the model in part 2.

  6. Is multicollinearity a potential concern in the model identified in part 2.

  7. Use the model in part 2 to predict each observation in the test test and calculate the out-of-sample accuracy.

  8. Prepare a brief report presenting your EDA and regression analysis.

Acknowledgement

Moro, S., Rita, P., & Vala, B. (2016). Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach. Journal of Business Research, 69(9), 3341-3351.