1. Data

Download the data file from the course website here (under week 12).

Use readRDS function to read the file.


facebookdata_marketing <- readRDS("_GIVE_FILE PATH_/facebookdata_marketing.rds")

Your turn (optional)

Explore here function in the here package.

Read here

2. Variable description

A manager of a retail company wants to develop a regression model to identify the effect of the following variables (see below) on the total number of likes, comments, and shares on facebook posts.

Dependent variable:

3. Training test and Test set

smp_size <- 400

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(facebookdata_marketing)), size = smp_size)
train <- facebookdata_marketing[train_ind, ]
test <- facebookdata_marketing[-train_ind, ]

4. Questions

  1. Perform a thorough Exploratory Data Analysis on facebookdata_marketing.rds.

  2. Develop a suitable regression model to predict total interactions (The sum of “likes,” “comments,” and “shares” of the post).

  3. Test for significance of regression. What conclusions can you draw?

  4. Using \(t\) tests, determine the contribution of the regressors in your final model. Discuss your findings.

  5. Plot 95% confidence interval for the regression coefficients of the model in part 2.

  6. Is multicollinearity a potential concern in the model identified in part 2.

  7. Use the model in part 2 to predict each observation in the test test and calculate the out-of-sample accuracy.

  8. Prepare a brief report presenting your EDA and regression analysis.

Acknowledgement

Moro, S., Rita, P., & Vala, B. (2016). Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach. Journal of Business Research, 69(9), 3341-3351.