library(tidyverse)
library(magrittr)
library(ggplot2)
library(plotly)
library(tidyverse)
library(coronavirus)
data("coronavirus")
italy_corona <- coronavirus %>% filter(country == "Italy")
This data analysis was carried out to identify the corona virus situation in the Italy.This italy_corona data set contain daily summary of corona virus data in Italy for 336 days. There are 3 variables
Variable | Description |
---|---|
date | Date in YYYY-MM-DD format |
type | An indicator for the type of cases (confirmed, death, recovered) |
cases | Number of cases on given date |
Quantitative : cases
Qualitative : date, type
library(maptools)
library(tibble)
library(ggrepel)
library(png)
library(grid)
library(sp)
data(wrld_simpl)
p <- ggplot() +
geom_polygon(
data = wrld_simpl,
aes(x = long, y = lat, group = group), fill = "light blue", colour = "white"
) +
coord_cartesian(xlim = c(-180, 180), ylim = c(-90, 90)) +
scale_x_continuous(breaks = seq(-180, 180, 120)) +
scale_y_continuous(breaks = seq(-90, 90, 100))
p +
geom_point(
data = italy_corona, aes(x = long, y = lat), color = "red", size
= 1
)
Italy is located in south-central Europe, and it is also considered a part of western Europe.
Italy has a predominantly Mediterranean climate with mild, sometimes rainy winters and sunny, hot, and usually dry summers.Italy are cool and humid in the north and the mountainous zone.The summer can be quite hot in Italy, mainly in the south of the peninsula, with high nocturnal temperatures of usually 28-33°C, but sometimes even 40°C.
Italy is one of the most corona positive cases reported country. According to Wikipedia , "On 9 March 2020, the government of Italy under Prime Minister Giuseppe Conte imposed a national quarantine, restricting the movement of the population except for necessity, work, and health circumstances, in response to the growing pandemic of COVID-19 in the country."
AS well as government take following methods to reduce risk of Corona virus.
First we look at the summary of the corona virus cases in Italy
summary(italy_corona)
date province country lat
Min. :2020-01-22 Length:336 Length:336 Min. :43
1st Qu.:2020-02-18 Class :character Class :character 1st Qu.:43
Median :2020-03-17 Mode :character Mode :character Median :43
Mean :2020-03-17 Mean :43
3rd Qu.:2020-04-14 3rd Qu.:43
Max. :2020-05-12 Max. :43
long type cases
Min. :12 Length:336 Min. : -1
1st Qu.:12 Class :character 1st Qu.: 0
Median :12 Mode :character Median : 404
Mean :12 Mean :1075
3rd Qu.:12 3rd Qu.:1640
Max. :12 Max. :8014
italy_corona$Month <- months(as.Date(italy_corona$date))
recovered_italy_corona <- italy_corona %>% filter(type=="recovered")
confirmed_italy_corona <- italy_corona %>% filter(type=="confirmed")
death_italy_corona <- italy_corona %>% filter(type=="death")
The data was collected from 2020-01-22 to 2020-05-12. In here number of minimum recovered cases reported as -1.This gives the incorrect information so this can be a outlier for our data. Therefore for we can replace it as missing value.
recovered_italy_corona <- recovered_italy_corona %>% mutate(cases = replace(cases, which(cases < 0), NA))
library(tidyverse)
library(data.table)
italy_corona_new <- tibble(Month = c("January", "February", "March", "April", "May","Total"),
confirmedTotal = c(2, 1126, 104664, 99671, 15753, 221216),
confirmedPercentage = c(0.0009904, 0.509, 47.31, 45.05, 7.12, 100),
RecoveredTotal = c(0, 47, 15383, 64871, 33094, 103395),
RecoveredPercentage = c (0, 0.045, 14.87, 62.74, 32.007, 100),
DeathTotal = c(0, 29, 12399, 13956, 2944,29328),
DeathPercentage = c (0, 0.098, 42.27, 47.73, 10.03, 100))
p <- setDT(italy_corona_new)
p
Month confirmedTotal confirmedPercentage RecoveredTotal
1: January 2 9.904e-04 0
2: February 1126 5.090e-01 47
3: March 104664 4.731e+01 15383
4: April 99671 4.505e+01 64871
5: May 15753 7.120e+00 33094
6: Total 221216 1.000e+02 103395
RecoveredPercentage DeathTotal DeathPercentage
1: 0.000 0 0.000
2: 0.045 29 0.098
3: 14.870 12399 42.270
4: 62.740 13956 47.730
5: 32.007 2944 10.030
6: 100.000 29328 100.000
According to table 1 March reported the most number of corona virus cases.It was more than 47% from the total.The most of recovered cases and death cases were reported in April with 62.7% and 47.3% percentage respectively.
library(patchwork)
p1 <- ggplot(italy_corona, aes(x = italy_corona$type, y = italy_corona$cases, color=type)) +
geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) +
geom_violin(alpha = 0.2, fill = "blue", width = 1) +
xlab("Type") +
ylab("number of Corona cases") +
ggtitle("Distribution of corona cases by type")
p2 <- ggplot(italy_corona, aes(x =cases, fill=type)) +
geom_density(alpha=0.5) +
xlab("Type") +
ylab("Corona cases") +
ggtitle("Distribution of corona cases by type")
p1|p2
figure 1 shows that all the type of corona cases positively distributed.As well as we can see deaths has bi modal distribution. To visualize this distribution we can plot density plot. This figure also show that all the types are positively distributed and there is a bi model distribution for deaths.
p1 <- ggplot(italy_corona,aes(date, cases, color = type), is.na=FALSE) +
geom_point() +
geom_line() +
ggtitle("Time series analysis-Italy")
ggplotly(p1)
By figure 2, We can see number of corona virus patients started increase after March but in the beginning of the April it is getting lower. In May we can see that number of recovered cases showed sudden peaks.AS well as number of confirmed cases and number of deaths getting lower in may. After the middle of may number of confirmed cases shows some repeated pattern.The highest confirmed cases reported in 31st of March with 6557 cases and highest numnber of death reported in 27th of march with 919 deaths.
Since Italy is one of the most corona cases reported country I choose India, USA and Saudi Arabia to comparison.
us_corona <- coronavirus %>% filter(country == "US")
India_corona <- coronavirus %>% filter(country == "India")
saudiArabia_corona <- coronavirus %>% filter(country == "Saudi Arabia")
p11 <- ggplot(us_corona,aes(date, cases, color = type)) +
geom_point() +
geom_line()+
ggtitle("Time series analysis-US")
p1 / p11
While looking at both graphs we can see that corona virus outbreak in USA happened later than Italy.Both number of confirmed cases and number of death cases are higher in USA. AS well as number of confirmed cases getting lesser slowly.
p12 <- ggplot(India_corona,aes(date, cases, color = type)) +
geom_point() +
geom_line()+
ggtitle("Time series analysis-India")
p1 / p12
Figure 4 shows that Italy started to control the corona confirmed cases ,India began to confirm more corona cases.But number of death cases are lower than Italy.
p13 <- ggplot(saudiArabia_corona,aes(date, cases, color = type)) +
geom_point() +
geom_line()+
ggtitle("Time series- Saudi Arabia")
p1 / p13
According to figure 5, Saudi Arabia started corona outbreak after April while Italy started after March.
Since coronavirus package contain data about all the countries first we need to filter out Italy from the data set.
In Italy there is a recovered case that reported as -1. It is an Outlier for our analysis and also It is a incorrect information.This might be happened because there may be some changes in the counting methodology or data resources, Errors in raw data, updating new cases not on the day that they were counting.Since data are important in analysis we cannot remove the entire row that containing the negative value.We cannot replace it as zero because of zero is also a value in this analysis.Hence we changed it as missing value. After summarizing the data we can see that March reported most confirmed cases and April reported most number of deaths and recovered cases.Their lock down policies were introduced after 9th of March. So this might be the reason for increase of confirmed corona cases.Since there were no any social distancing measures, people may be interacted with each other without using proper health practices.
In the distributions of each type they all are positively skewed but in death there is bi modal distribution.Lets think lock down as a indicator variable , like lockdwon is open or not. So both of them have 2 modes. I think this might be the reason for this bi modal.
When its come to the time series analysis number of confirmed cases and death cases decrease over the time. In may, recovered cases shows large spikes.The worst situation of corona in Italy came to under the control in May.Since there were travel ban , lock down and social distancing, they reduced corona virus spread. In the beginning of middle of may we can see there is a repeated pattern in confirmed cases.We cannot say this as a seasonal pattern, because length of the seasonal pattern should be at least 1 year.In here we don't know the length of the pattern.
When it comes to the comparison with other countries, since Italy is one of the country that suffered major damage due to corona virus I choose USA, India and Saudi Arabia.Those 3 countries also suffered major damages.
USA shows the worst scene out of Italy, India Saudi Arabia and USA. At the nd of the may, there were morethan 20000 cases. Italy and USA keep decreasing after May but both India and Saudi Arabila still keep increasing corona virus confirmed cases.I think among those 4 countries Italy is in a best situation after April.