do a traffic analysis for Catalan, Chinese, English, Hebrew, Italian, and Uyghur versions of Wikipedia. Lila is asking for a traffic report. Chinese & Uyghur Wikipedia transition was on Tuesday. Catalan, Chinese, Hebrew, and Italian yesterday. English at 2AM today. We also need to know traffic impact on English Wikipedia geolocated to China.
Timeline provided by @BBlack:
The Chinese languages are Chinese (zh) and Uyghur (ug)
The HTTPS-Beta languages are Catalan (ca), Hebrew (he), Greek (el), Italian (it).
2015-06-09 21:23 UTC: transitioned Chinese languages, all projects 2015-06-11 14:00 UTC: transitioned HTTPS-Beta language Wikipedias 2015-06-11 14:30 UTC: transitioned HTTPS-Beta language Mobile Wikipedias 2015-06-12 08:43 UTC: Routing Incident: First evidence (others noticing, not us) 2015-06-12 09:00 UTC: Routing Incident: Level3 fallout in full effect, many notice 2015-06-12 09:00 UTC: Start transition of English Wikipedia, including Mobile
During this 40 minute window for English, we first redirected 10% of clients, then 50%, then 100%
2015-06-12 09:40 UTC: End transition of English Wikipedia, including Mobile 2015-06-12 10:40 UTC: Routing Incident: Largely resolved, some smaller trailing effects 2015-06-12 13:00 UTC: Public blog announcement 2015-06-12 13:30 UTC: transitioned All other projects (e.g. wikiversity, wikibooks, etc) for English + Beta languages
@ellery depending on how you need to handle chinese languages, you should either go for:
- project pre-aggregated table (no dialect/language_variant, preproduction mode, very small data)
- hive: joal.pageview_hourly
- hdfs parquet files: /user/joal/pageview/hourly/year=2015/month=6/day=X/hour=X
- pageview pre-aggregated table (dialect/language_variant available, production mode, medium-small data)
- hive: wmf.pageview_hourly
- hdfs parquet files: /wmf/data/wmf/pageview/hourly/year=2015/month=6/day=X/hour=X
Let me know if you want me to spend some time with you :)
Ok, I just fired off a query to get https status as well. I am running over the logs for June (sampling 1 out of 64 buckets). This query is estimated finish in 3 days....
geocoded_data['country'] as country,
x_analytics_map['https'] as https,
count(*) as n
FROM wmf.webrequest TABLESAMPLE(BUCKET 1 OUT OF 64 ON rand())
WHERE year = 2015
AND month = 6
AND webrequest_source in ('mobile', 'text')
AND is_pageview = 1
AND uri_host RLIKE '(ca|en|zh|it|ug|he)\\.(m\\.)?wikipedia'
GROUP BY uri_host, geocoded_data['country'], x_analytics_map['https'], http_status,
access_method, agent_type, year, month, day, hour;
As mentioned above, the query will still run for a few days. But here are some preliminary results from a query I kicked off on Friday.
Summary: We see a drop in pageviews from zhwiki from Chinese desktop users and US bots. All beta language projects seem unaffected. The data is too right terminated to evaluate the change for enwiki.
The transition is still ongoing. New timeline events from today (assume for any language mentioned, it's for all projects that language has):
All times UTC, and +/- 5 mins:
2015-06-15 20:15 - Wikidata and Roots/www (see below)
2015-06-15 20:25 - de
2015-06-15 21:00 - Commons
2015-06-15 21:25 - fr, ja
2015-06-15 23:15 - Reverted Commons ( for now, due to: T102566 )
2015-06-16 00:00 - bg, cs, eo, fi, id, nl, no, pl, pt, sv, th, tr
Roots/www means any of our primary domains without a language prefix, including mobile, as well as www in place of the language prefix. e.g. http://wikipedia.org, http://www.wikiversity.org, http://m.wikibooks.org, etc. Mostly these are language-selector pages or language-detecting redirects.
I have updated the graphs in https://github.com/ewulczyn/wmf/blob/master/https_transition/https_transition.ipynb.
Iran shows a severe persistent drop in pageview rates for enwiki. China has a less severe but still persistent drop.
@ellery thanks for the great work. Can you add a short section with the main takeaways at the top of the nb? Other than country-specific data, given that bot traffic historically accounted for up to half of PVs from the US, this will result in a major drop in the legacy PVs and we'll need to communicate this clearly.
@kevinator is Yana responsible for presenting the results internally?
This task has "Unbreak Now!" priority for three months now which means it "needs to be fixed immediately, setting anything else aside."
@ellery: What is the status of this task? Is the priority still correct?
@ellery: Is there a reason to not change the task status to "Resolved" via the "Action > Change Status" dropdown above the "Comments" box, so the task won't show up under the list of open tasks anymore?
I'll just do this and resolve this task, hoping I understood your previous comment correctly. Please reopen if I'm wrong.