Since the advent of the ‘sickness’, many organizations are releasing their data, leading to data scientists plotting all sorts of data in all sorts of ways. Herein I add to the noise by plotting a mashup of two data sets provided by Mozilla and Apple.
Mozilla describes their data in their blog.
The data we focus on here is their metric desktop_dau, which is the number of active daily users of Firefox. Actually, we look at the relative increase over the forecasted users, where the forecast was performed with FaceBook’s excellent Prophet package. In summary, a larger value means more desktop users than what was predicted.
The Apple data is described here and is the relative volume of directions requests per country/region compared to a baseline volume on January 13th, 2020. Here a smaller value means fewer directions requests. This data was already the subject of this post.
First we load our needed libraries.
library(readr) library(tidyverse) library(janitor) library(lubridate)
We get the Mozilla data first, and clean it up a bit.
dfm<-read_csv("https://docs.google.com/spreadsheets/d/1jHWW9QYAOCNTVwyWF29YiVGDf4uX3TcLgREVrQ1bkHI/export?format=csv")%>% select(-ci_deviation) %>% pivot_wider( names_from=metric, values_from=deviation ) %>% mutate(code = str_sub(geography,1,2))
Next we get two-letter country codes to join, and use the terrific janitor package to clean the field names.
dfc <- read_csv("https://pkgstore.datahub.io/JohnSnowLabs/country-and-continent-codes-list/country-and-continent-codes-list-csv_csv/data/b7876b7f496677669644f3d1069d3121/country-and-continent-codes-list-csv_csv.csv") %>% clean_names()
Then we join the two letter codes data back to the Mozilla data
dfm <- dfm %>% left_join(dfc, by = c("code"="two_letter_country_code"))
Now, get the Apple data, make it long, and parse the date out.
dfa <- read_csv("https://covid19-static.cdn-apple.com/covid19-mobility-data/2005HotfixDev14/v1/en-us/applemobilitytrends-2020-04-14.csv") %>% pivot_longer( cols = starts_with('2020'), names_to = 'dt', values_to = 'val') %>% mutate( dt = parse_date(dt) )
Now we have the one ugly part of this code, matching different ways to spell a country’s name. I did not find a source all ready, so, we brute force it.
dft <- tribble( ~name, ~code, 'Albania','AL', 'Argentina','AR', 'Australia','AU', 'Austria','AT', 'Belgium','BE', 'Brazil','BR', 'Bulgaria','BG', 'Cambodia','KH', 'Canada','CA', 'Chile','CL', 'Colombia','CO', 'Croatia','HR', 'Czech Republic','CZ', 'Denmark','DK', 'Egypt','EG', 'Estonia','EE', 'Finland','FI', 'France','FR', 'Germany','DE', 'Greece','GR', 'Hong Kong','HK', 'Hungary','HU', 'India','IN', 'Indonesia','ID', 'Ireland','IE', 'Israel','IL', 'Italy','IT', 'Japan','JP', 'Latvia','LV', 'Lithuania','LT', 'Luxembourg','LU', 'Malaysia','MY', 'Mexico','MX', 'Morocco','MA', 'Netherlands','NL', 'New Zealand','NZ', 'Norway','NO', 'Philippines','PH', 'Poland','PL', 'Portugal','PT', 'Republic of Korea','KR', 'Romania','RO', 'Russia','RU', 'Saudi Arabia','SA', 'Serbia','RS', 'Singapore','SG', 'Slovakia','SK', 'Slovenia','SI', 'South Africa','ZA', 'Spain','ES', 'Sweden','SE', 'Switzerland','CH', 'Taiwan','TW', 'Thailand','TH', 'Turkey','TR', 'UK','GB', 'United States','US', 'Ukraine','UA', 'United Arab Emirates','AE', 'Uruguay','UY', 'Vietnam','VN' )
We tack these codes on to the Apple data
dfa<-dfa %>% left_join(dft, by = c("region"="name"))
and now we can build our final data frame
df <- dfm %>% filter(str_length(geography)==2)%>% left_join(dfa, by = c("code"="code", "date"="dt")) %>% filter( geo_type=='country/region')
A quick plot shows something interesting…the data seems to be in two pieces.
df%>% filter(continent_name == 'Europe') %>% mutate(transportation_type = str_to_title(transportation_type))%>% filter(transportation_type=='Driving')%>% ggplot(aes( x = mean_active_hours_per_client, y = val))+ geom_point(alpha=0.5)+ theme_minimal()+ theme(legend.position="none")
Colouring the points according to whether they were before or after St. Patrick’s day provides insight
df%>% filter(continent_name == 'Europe') %>% mutate(transportation_type = str_to_title(transportation_type))%>% filter(transportation_type=='Driving')%>% mutate(period = if_else(date<= make_date(2020,3,17), "Before St Patrick's Day", "After St Patrick's Day"))%>% ggplot(aes( x = mean_active_hours_per_client, y = val, colour = period))+ geom_point(alpha=0.5)+ annotate(geom="text", x=0.17, y=125, label="Before St Patrick's Day", color="#00BFC4")+ annotate(geom="text", x=-0.09, y=35, label="After St Patrick's Day", color="#F8766D")+ coord_cartesian(xlim=c(-.25,0.25), ylim=c(0, 150))+ #facet_wrap(~transportation_type)+ labs( title = "Firefox Desktop Usage vs Apple Driving Directions Requests", subtitle = "Europe, Jan 31 - April 14, 2020\nTons of fine print apply", y = 'Apple directions requests (Jan 13, 2020 = 100)', x = 'Deviations in Firefox desktop active hours from forecast', caption = "Each point is one day in one country\nMozilla data from https://blog.mozilla.org/data/2020/03/30/opening-data-to-understand-social-distancing/\nApple data from https://www.apple.com/covid19/mobility" )+ theme_minimal()+ theme(legend.position="none")
After the shutdowns, more Firefox desktop usage is associated with decreased direction requests. Of course, this as a good illustration of ‘correlation is not causation’.