Page MenuHomePhabricator

[REQUEST] Data about top articles for Wikipedia's 20th birthday
Closed, ResolvedPublic

Description

Request from Communications department via @hdothiduc

What's requested:

The data we would like to request:

(1) for each year the most viewed article for ~10 language Wikipedias – with the English description, an article image and the daily pageviews in that year;
(2) for each year the most edited article for ~10 language Wikipedias – with the English description, an article image and the daily edits in that year.

Here is a spreadsheet to show this dream data in a better way: https://docs.google.com/spreadsheets/d/1eLEuwGapXS_5CIpDHbmd9vhTWRkp5TKu7NDEAnMzFJM/edit#gid=0.

Why it's requested:

For Wikipedia's 20th birthday (https://groups.google.com/a/wikimedia.org/g/foundation-optional/c/x5FVKSKpw-o/m/qQCyANE3CwAJ) we will be creating pages on wikimediafoundation.org to celebrate this milestone. We want to feature data about top articles (the most viewed and most edited) from different language Wikipedias from the past 20 years.

When it's requested:

We would like to work with you to determine what is feasible in what time frame.

In October we have to present the final material to Heather Walls and Katherine Maher for approval, after which we can proceed to have the content translated into 6 languages. We launch in January.

Other helpful information:

The birthday as an Annual Plan OKR (Brand Awareness in MTP) is of high priority for the Communications department. We will use the data to design and develop an engaging, interactive page on the foundation website that will be visited by hundreds of thousands of readers, and utilized internationally by press and media. We expect these materials to receive wide distribution.

Event Timeline

We will review and triage this at the next board refinement meeting (Tuesday, September 1st) and follow up re: time frames.

LGoto triaged this task as High priority.
LGoto moved this task from Triage to Needs Investigation on the Product-Analytics board.
mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

Checked with Connie and she has the capacity to work on this in the next couple of weeks; moving to Kanban.

@mpopov, great news, thank you!

@cchen let me know if it makes sense to have a short kickoff call, then I can schedule that. Looking forward to working together!

Something to also cover in our meeting tomorrow (@cchen) :

(1) for each year the most viewed article for ~10 language Wikipedias – with the English description, an article image and the daily pageviews in that year;
(2) for each year the most edited article for ~10 language Wikipedias – with the English description, an article image and the daily edits in that year.

Of these 10 languages Wikipedias, 7 are definitely: English, Spanish, French, German, Russian, Chinese, Arabic, the rest we have not identified yet. Depending on how complicated it would be to scale beyond these 7 language Wikipedias, we would be interested in learning if you could help us determine a few language Wikipedias based on a list of emerging/priority countries from Comms (countries that we will focus on in press outreach and social media outreach). So, for example, if this list includes India, Indonesia and Nigeria, which language Wikipedias would make sense to highlight.

@cchen Thank you for the great, productive meeting!

I found the the blog post about the most edited articles that Comms published for the 15th Birthday of Wikipedia.
It was originally published on blog.wikimedia.org, but lives on our new community blog now: https://diff.wikimedia.org/2016/01/14/most-edited-articles/
The article states the source to be http://alpha.hatnote.com/15/, at the bottom of the page you can find more links for the source code.

@hdothiduc Thanks for the meeting and a great introduction to the project yesterday. And the blog post for the 15th Birthday of Wikipedia is a helpful reference!

Per our conversation yesterday, here's a few additional language Wikipedias we could include.

Based on a list of emerging/priority countries, language Wikipedias Hindi, Indonesian and Igbo are most viewed and most used Wikipedias in India, Indonesia and Nigeria other than 7 languages Wikipedias you already included. Hindi and Igbo are relatively small Wikipedias compared to Indonesian.

Besides the language Wikipedias above, Portuguese, Japanese, Italian, Turkish and Swedish are other top language Wikipedias based on articles, traffic and users we could include.

Thank you very much @cchen!
I will bring this information back to the team and let you know as soon as possible!
This should not be a blocker for you, since English and the other 6 defined language Wikipedias are the most important.

Let me know if you have any more questions!

Top pages with pageviews and edits by wiki were update in this folder. We reviewed text and notes for the data this week.

HUGE thank you to you Connie for being very patient with this Comms request and doing great work here!
I processed the data (as discussed) further and have documented my process as good as I can in this spreadsheet, where I also have the final data sets as sheets.