Page MenuHomePhabricator

Estimate impact of Facebook's article context feature
Closed, ResolvedPublic

Description

Cf. https://lists.wikimedia.org/pipermail/wikitech-l/2018-April/089741.html and http://www.niemanlab.org/2018/04/facebook-is-adding-a-button-to-let-users-get-more-background-information-aka-information-from-wikipedia-pages-on-publishers/

  • Calculate the daily number of enwiki pageviews in the US with a referrer from Facebook, before and after the begin of the feature's full rollout on April 3, 2018
  • Estimate the number of additional daily pageviews resulting from the feature
  • Publish a list of the top N pages receiving the most referrers, for some timespans before and after the rollout

Event Timeline

Those are not insignificant traffic increases. Eyballing just the sites Andrew included, that's something like 10k pvs per day, if it keeps up. For no effort on our part, that is a lot of mindshare.

Those are not insignificant traffic increases. Eyballing just the sites Andrew included, that's something like 10k pvs per day, if it keeps up. For no effort on our part, that is a lot of mindshare.

To clarify, the spike around February 3 that Andrew pointed out there is likely unrelated - it obviously predates Facebook's rollout, and its size is much higher than the ~500 views/daily that their earlier tests reportedly generated. In the linked post it is actually instead conjectured that it might have to do with YouTube instead, but there too it doesn't match the timing of their recent SXSW announcement. We should look into that in a separate task.

Megan and I discussed this task a bit further today, she is going to tackle it.

Regarding the third item (list of top referred pages), I already did a quick and dirty query for yesterday, April 4 - excuse the formatting. Does not look like being dominated by articles about news media.

urlrequests
https://en.m.wikipedia.org/wiki/List_of_stations_owned_or_operated_by_Sinclair_Broadcast_Group13162
https://en.m.wikipedia.org/wiki/Nazi_gun_control_argument8245
https://en.m.wikipedia.org/wiki/Posse_Comitatus_Act7562
https://en.m.wikipedia.org/wiki/List_of_Nestl%C3%A9_brands5713
https://en.m.wikipedia.org/wiki/Rusty_trombone5358
https://en.m.wikipedia.org/wiki/Success_Kid5354
https://pl.m.wikipedia.org/wiki/Micha%C5%82_Dworczyk2898
https://ar.m.wikipedia.org/wiki/%D8%A7%D9%84%D8%AD%D9%88%D8%AA_%D8%A7%D9%84%D8%A3%D8%B2%D8%B1%D9%82_(%D9%84%D8%B9%D8%A8%D8%A9)2897
https://de.m.wikipedia.org/wiki/Pizzagate2477
https://es.m.wikipedia.org/wiki/Hiperrealismo2025
https://nl.m.wikipedia.org/wiki/Hoog_Catharijne1953
https://en.m.wikipedia.org/wiki/Internet_meme1764
https://en.wikipedia.org/wiki/List_of_best-selling_girl_groups1673
https://pt.m.wikipedia.org/wiki/Trof%C3%A9u_Imprensa_de_melhor_programa_de_audit%C3%B3rio1624
https://th.m.wikipedia.org/wiki/%E0%B8%9E%E0%B8%A3%E0%B8%B0%E0%B8%9E%E0%B8%B8%E0%B8%97%E0%B8%98%E0%B8%A3%E0%B8%B9%E0%B8%9B1616
https://ar.m.wikipedia.org/wiki/%D8%A3%D8%AD%D9%85%D8%AF_%D8%AE%D8%A7%D9%84%D8%AF_%D8%AA%D9%88%D9%81%D9%8A%D9%821608
https://en.m.wikipedia.org/wiki/Age_of_marriage_in_the_United_States1594
https://en.wikipedia.org/wiki/Facebook1591
https://fr.m.wikipedia.org/wiki/Harry%27s_New_York_Bar1573
https://en.m.wikipedia.org/wiki/Operation_Northwoods1533
https://en.m.wikipedia.org/wiki/Solipsism1418
https://en.m.wikipedia.org/wiki/1974_Super_Outbreak1401
https://en.m.wikipedia.org/wiki/Ptaquiloside1396
https://en.m.wikipedia.org/wiki/Folklore1388
https://als.m.wikipedia.org/wiki/Konstanz1305
https://he.m.wikipedia.org/wiki/%D7%9B%D7%A4%D7%A81263
https://en.wikipedia.org/wiki/Success_Kid1221
https://th.m.wikipedia.org/wiki/%E0%B8%9E%E0%B8%A3%E0%B8%B0%E0%B8%9E%E0%B8%B8%E0%B8%97%E0%B8%98%E0%B8%A3%E0%B8%B9%E0%B8%9B1141
https://it.m.wikipedia.org/wiki/Salvatore_Venuta1089
https://en.m.wikipedia.org/wiki/Williams_Stadium1088

Data via

SELECT CONCAT('https://',uri_host,uri_path,uri_query) AS url, SUM(1) AS requests
FROM wmf.webrequest 
WHERE year = 2018 AND month = 4 AND day = 4 
AND is_pageview
AND referer LIKE '%facebook%' -- should still check for capitalization and other possible issues
GROUP BY uri_host, uri_path, uri_query
ORDER BY requests DESC LIMIT 30;

Initial results of the daily number of enwiki pageviews in the US with a referrer from Facebook.

daily_pageviews_fb.png (1×1 px, 149 KB)

There does not appear to be any significant changes in daily pageviews to English Wikipedia from Facebook immediately following the full rollout of the article context feature on April 3, 2018. There's an increase in pageviews with a facebook referrer prior to the rollout between March 31 and April 2nd. This may be due to higher facebook traffic over the Easter holiday weekend.

Pending potentially obtaining a full list of articles on which the feature is deployed, I did a quick breakdown of pageviews for some popular US new sources that I confirmed to have the facebook article context feature. The following graph shows a sudden increase of pageviews to these pages from Facebook on April 3rd, with a maximum of 425 pageviews to the English Wikipedia Washington Post page on April 4th. Further monitoring of pageviews over the coming weeks will help determine if there are any sustained overall increases in daily pageviews with a facebook referrer.

I'll also take a look at top N pages (item 3) and work with @Tbayer to try to estimate the number of additional pageviews.

daily_pageviews_bynews.png (1×1 px, 137 KB)

Data via

daily_pageviews<- do.call(rbind, lapply(seq(start_date, end_date, "day"), function(date) {
  cat("Fetching webrequest data from ", as.character(date), "\n")
  clause_data <- wmf::date_clause(date)
  
query <- paste("
SELECT '", date, "' AS date, COUNT(1) as requests
FROM wmf.webrequest",
clause_data$date_clause,
" AND referer LIKE '%facebook.com%'
AND is_pageview = TRUE
--all referers from facebook should by classified as external
AND referer_class = 'external'
AND http_status IN ('200', '304')
--remove bots
AND agent_type = 'user'
-- only look at English Wikipedia pageviews in the US
AND normalized_host.project = 'en'  
AND normalized_host.project_class = 'wikipedia'

https://github.com/wikimedia-research/facebook-article-context-impacts

I took a look at the top facebook referred pages on English Wikipedia before and after the rollout date.

One week prior to the feature rollout date on April 3rd (March 26th through April 2nd), there are no news media pages in the top 30 facebook referred English Wikipedia pages in the U.S. On April 4th (one day following the rollout), there are 5 news media pages in the top 30 facebook referred pages. Between April 3rd through April 9th, this number declines to only 2 news media articles in the top 30. Results below:

Summary of Top Facebook Referred News Media Articles
Top Facebook referred new media articles on April 4th (day after rollout)

No.Page TitleRequests
13."The Daily Wire"(Mobile Web)718
14."Breitbart News"(Desktop)679
22."Vox (website)"(Mobile Web)466
23."Breitbart News"(Mobile Web)462
29."The Washington Post" (Mobile Web)425

Top Facebook referred new media articles between April 3rd to April 9th (week after rollout)

No.Page TitleRequests
16."The Daily Wire"(Mobile Web)2556
28."Breitbart News"(Mobile Web)2021

I confirmed all of the above news media sources included the article context feature on Facebook.

Full List of Top 30 Facebook Referred Pages
Top 30 Facebook Referred Pages Between March 26th through April 2nd (1 week prior to rollout)

No.UrlRequests
1https://en.m.wikipedia.org/wiki/List_of_stations_owned_or_operated_by_Sinclair_Broadcast_Group62627
2https://en.m.wikipedia.org/wiki/Nazi_gun_control_argument10868
3https://en.m.wikipedia.org/wiki/David_Hogg_(activist)8127
4https://en.m.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States6777
5https://en.wikipedia.org/wiki/List_of_stations_owned_or_operated_by_Sinclair_Broadcast_Group6276
6https://en.wikipedia.org/wiki/List_of_best-selling_girl_groups6016
7https://en.m.wikipedia.org/wiki/April_Fools'_Day5928
8https://en.m.wikipedia.org/wiki/Affective_spectrum5595
9https://en.m.wikipedia.org/wiki/%C5%9Amigus-dyngus5511
10https://en.m.wikipedia.org/wiki/1993_Scotts_Mills_earthquake5191
11https://en.m.wikipedia.org/wiki/%C4%92ostre4913
12https://en.m.wikipedia.org/wiki/Jenna_Fischer4249
13https://en.m.wikipedia.org/wiki/Charlie_Day4003
14https://en.m.wikipedia.org/wiki/Strait_of_Messina_Bridge3535
15https://en.m.wikipedia.org/wiki/Steve_Carell3510
16https://en.m.wikipedia.org/wiki/Lori_Erica_Ruff3443
17https://en.m.wikipedia.org/wiki/Battle_of_Barawala_Kalay_Valley3176
18https://en.m.wikipedia.org/wiki/Funeral_potatoes3166
19https://en.m.wikipedia.org/wiki/ArmaLite_AR-153156
20https://en.m.wikipedia.org/wiki/Stormy_Daniels2925
21https://en.m.wikipedia.org/wiki/Death_of_Jason_Callahan2869
22https://en.m.wikipedia.org/wiki/Federal_Assault_Weapons_Ban2811
23https://en.m.wikipedia.org/wiki/Fordyce_spots2704
24https://en.m.wikipedia.org/wiki/Whataboutism2698
25https://en.m.wikipedia.org/wiki/Puckle_gun2657
26https://en.m.wikipedia.org/wiki/Easter2614
27https://en.m.wikipedia.org/wiki/The_dress2610
28https://en.m.wikipedia.org/wiki/Assault_rifle2534
29https://en.m.wikipedia.org/wiki/Bath_School_disaster2277
30https://en.m.wikipedia.org/wiki/Maundy_Thursday2166

Top 30 Facebook Referred Pages Between April 3rd through April 9th (1 week after rollout)

No.UrlRequests
1https://en.m.wikipedia.org/wiki/List_of_stations_owned_or_operated_by_Sinclair_Broadcast_Group66980
2https://en.m.wikipedia.org/wiki/List_of_Nestl%C3%A9_brands32147
3https://en.m.wikipedia.org/wiki/Nazi_gun_control_argument25637
4https://en.m.wikipedia.org/wiki/Posse_Comitatus_Act18729
5https://en.m.wikipedia.org/wiki/Rusty_trombone9195
6https://en.wikipedia.org/wiki/List_of_best-selling_girl_groups6916
7https://en.wikipedia.org/wiki/List_of_stations_owned_or_operated_by_Sinclair_Broadcast_Group4843
8https://en.m.wikipedia.org/wiki/National_Beer_Day_(United_States)4709
9https://en.m.wikipedia.org/wiki/Joyce_Vincent4563
10https://en.m.wikipedia.org/wiki/1974_Super_Outbreak4130
11https://en.m.wikipedia.org/wiki/Ralph_%22Bucky%22_Phillips3814
12https://en.m.wikipedia.org/wiki/%C5%9Amigus-dyngus3259
13https://en.m.wikipedia.org/wiki/Scheana_Marie3088
14https://en.m.wikipedia.org/wiki/List_of_U.S._state_budgets2698
15https://en.m.wikipedia.org/wiki/Operation_Northwoods2632
16https://en.m.wikipedia.org/wiki/The_Daily_Wire2556
17https://en.m.wikipedia.org/wiki/Judith_Barsi2504
18https://en.m.wikipedia.org/wiki/Niagara_Scenic_Parkway2395
19https://en.m.wikipedia.org/wiki/Williams_Stadium2356
20https://en.m.wikipedia.org/wiki/1832_Rothschild_loan_to_the_Holy_See2243
21https://en.m.wikipedia.org/wiki/Murder_of_Maria_Lauterbach2222
22https://en.m.wikipedia.org/wiki/Year_Without_a_Summer2160
23https://en.m.wikipedia.org/wiki/1996_Croatia_USAF_CT-43_crash2119
24https://en.wikipedia.org/wiki/Posse_Comitatus_Act2104
25https://en.m.wikipedia.org/wiki/Battle_of_Abu_Ghraib2095
26https://en.m.wikipedia.org/wiki/Tilapia2078
27https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect2035
28https://en.m.wikipedia.org/wiki/Breitbart_News2021
29https://en.m.wikipedia.org/wiki/Age_of_marriage_in_the_United_States2005
30https://en.m.wikipedia.org/wiki/Tartan_Day1879

Query

SELECT CONCAT('https://',uri_host,uri_path,uri_query) AS url, SUM(1) AS requests
FROM wmf.webrequest ",
clause_data$date_clause,
" AND is_pageview
AND referer LIKE '%facebook.com%' 
AND referer_class = 'external'
AND http_status IN ('200', '304')
AND agent_type = 'user'
AND normalized_host.project = 'en'  
AND normalized_host.project_family = 'wikipedia'
AND geocoded_data['country'] = 'United States'
GROUP BY uri_host, uri_path, uri_query
ORDER BY requests DESC LIMIT 30

Here are the updated daily facebook referred pageviews based on data through April 25th.

daily_pageviews_fb_v2.png (1×1 px, 168 KB)

There are no significant changes following the full rollout of the article context feature on April 3, 2018. In addition, any potential effects from Facebook's article context feature appear to be too small to determine from the overall number of Facebook referrals.

Looking just at the daily pageviews to a set of some news media related articles more clearly display the effect of the article context feature. The plot below includes a selection 9 new-media related articles including all those found in the top 30 Facebook referred pages the week following April 4th.

daily_pageviews_bynews_v2.png (1×1 px, 197 KB)

A week prior to the feature rollout (March 26 - April 3rd), there was an average of about 3 daily pageviews to these pages with a Facebook referrer. There is a significant increase of daily pageviews on April 4th; however, pageviews quickly decline after April 4th to an average of around 40 pageviews between April 11 and April 18th. Some of the higher pageviews around April 4th seen for Breitbart and potentially other sources may be the result of Facebook posts linking to the article directly in the context of news related coverage released that day.

Potential next steps if needed could include trying to expand the set of new articles pages pending a complete list from Facebook or by using Wikipedia categories but overall it appears that the article context feature had a very small effect on Facebook referred pageviews. The excerpt displayed by Facebook in the article context feature is large so it's possible that many Facebook users who access this feature do not clickthrough to the Wikipedia articles.

Let me know if you have any questions or comments,

Codebase

Great work, @MNeisler! Some additional remarks inline.

Here are the updated daily facebook referred pageviews based on data through April 25th.

daily_pageviews_fb_v2.png (1×1 px, 168 KB)

There are no significant changes following the full rollout of the article context feature on April 3, 2018. In addition, any potential effects from Facebook's article context feature appear to be too small to determine from the overall number of Facebook referrals.

Looking just at the daily pageviews to a set of some news media related articles more clearly display the effect of the article context feature. The plot below includes a selection 9 new-media related articles including all those found in the top 30 Facebook referred pages the week following April 4th.

daily_pageviews_bynews_v2.png (1×1 px, 197 KB)

A week prior to the feature rollout (March 26 - April 3rd), there was an average of about 3 daily pageviews to these pages with a Facebook referrer. There is a significant increase of daily pageviews on April 4th; however, pageviews quickly decline after April 4th to an average of around 40 pageviews between April 11 and April 18th. Some of the higher pageviews around April 4th seen for Breitbart and potentially other sources may be the result of Facebook posts linking to the article directly in the context of news related coverage released that day.

To add, for the record: The Breitbart-related coverage and social media attention which likely is a confounding factor here is summarized e.g. in https://www.haaretz.com/us-news/.premium-breitbart-declares-war-on-wikipedia-in-facebook-s-fight-against-fake-news-1.5991915 (paywalled, but may be accessible via Google).

Potential next steps if needed could include trying to expand the set of new articles pages pending a complete list from Facebook or by using Wikipedia categories

For the record: we have been thinking about using https://en.wikipedia.org/wiki/Category:Media_in_the_United_States

but overall it appears that the article context feature had a very small effect on Facebook referred pageviews. The excerpt displayed by Facebook in the article context feature is large so it's possible that many Facebook users who access this feature do not clickthrough to the Wikipedia articles.

To clarify: by now we can safely assert that the effect on overall Facebook-referred pageviews is very small. That said, we still have options for exploring question 2 ("Estimate the number of additional daily pageviews resulting from the feature") further. From F17344233, it appears that we could state a lower bound of about 400 daily pageviews for this just based on this fairly small sample of 9 articles, which we might be able to increase a lot when including the long tail of all articles in the aforementioned category. Focusing on the top-referred news media articles first was a great initial approach, but (besides the shape of the chart) the fact that the article about a comparatively small website like the Daily Wire surpassed those for e.g. the NYT or Fox News is another indicator that this was indeed dominated by the controversies/attention generated by Facebook's announcement itself, rather then the feature per se.

Let me know if you have any questions or comments,

Codebase

Some minor technical notes about the queries used (which should not affect the validity of the data):

  • Instead of the regexes for uri_path, it's probably preferable to use the already parsed field pageview_info['page_title'].
  • As discussed earlier, the condition http_status IN ('200', '304') is redundant to is_pageview (it's part of the pageview definition)
kzimmerman subscribed.

It looks like this task answered the original question (article context feature had a minimal impact on Facebook-referred pageviews), so I'm going to mark it as done. Let me know if we need to reopen!