Page MenuHomePhabricator

"Page views by edition of Wikipedia" for each country
Open, LowPublicFeature

Description

Hi,

The English Wikipedia (and other editions) has article about the Russian Wikipedia, Arabic Wikipedia, English Wikipedia, German Wikipedia, and French Wikipedia (and probably many others) that all contain a map such as this one showing the countries where this edition of Wikipedia is the most popular. Those maps are based on this Wikimedia Traffic Analysis Report. I think they're really interesting. However they're not up-to-date: the last Wikimedia Traffic Analysis Report is from September 2018.

I would like to update those maps and the different articles. Wikimedia Statistics only offers map|last-month|~total|monthly | Page views by country for each edition Wikipedia, but not the most popular edition of Wikipedia by country.

Would it be possible to add "Page views by edition of Wikipedia" for each country (and for the whole world)?

(I first asked this question there then discovered the feature request...)

Thanks for any help you can provide.

Antoine (a455bcd9)

Event Timeline

Hi @A455bcd9, thanks for taking the time to report this and welcome to Wikimedia Phabricator! For future reference, please always follow https://www.mediawiki.org/wiki/How_to_report_a_bug when creating a task. Thank you!

Hi @Aklapper ,

Thanks. FYI on Wikimedia Statistics, "New feature" directly points to Wikimedia Phabricator. It may be better to replace both the links "Report a bug" and "New feature" by https://www.mediawiki.org/wiki/How_to_report_a_bug

Best,

Antoine

@Aklapper , sorry I've just read T188859 and I don't see how they're related. T188859 is about "the geographical origin of the contributors for a given wiki" whereas this request is about the total page views for each edition for a given country.

Could you please explain how those two feature requests are related?

Antoine

@A455bcd9: Please edit the task summary to summarize which "new feature" this task is about, otherwise we have dozens of tasks which all only say "New Feature" in their summaries - see https://www.mediawiki.org/wiki/How_to_report_a_bug If you think it's not a duplicate then please change the task status to open. Thanks!

FYI on Wikimedia Statistics, "New feature" directly points to Wikimedia Phabricator. It may be better to replace both the links "Report a bug" and "New feature" by https://www.mediawiki.org/wiki/How_to_report_a_bug

That is not needed, as the "New Feature" link shows a "Well-written tasks with enough information get fixed faster: How to write a good bug report." header already.

Aklapper renamed this task from Wikistats New Feature to "Page views by edition of Wikipedia" for each country.Jul 8 2020, 10:05 AM
Aklapper reopened this task as Open.
Aklapper changed the subtype of this task from "Task" to "Feature Request".

@Aklapper Thanks a lot for reopening the task, I didn't know I could do it by myself. Thanks as well for changing the title. (I guess that's what you meant by "summary"?).

Regarding the header it is quite small and it says "How to write a good bug report". As I wanted to write a feature request I didn't click on it. So maybe the header could be made bigger and could be changed to "How to write a good bug report or a feature request". Also, the Title is automatically set to "Wikistats New Feature", would be better to let it blank to avoid having "dozens of tasks which all only say "New Feature" in their summaries" as you said.

Milimetric triaged this task as Medium priority.Jul 20 2020, 3:53 PM
Milimetric moved this task from Incoming to Datasets on the Analytics board.
Milimetric subscribed.

This requires a new dataset to be loaded into the API, we have this data but not packaged to respond to these queries. Would work well served from Druid instead of Cassandra.

Hi,

Actually there's maybe something simpler and as useful: instead of displaying the Wikipedias read by country one could easily calculate the % of views represented by one Wikipedia in a country among the views on all wikipedias in this same country.

To do that, one would only need this: https://stats.wikimedia.org/#/all-wikipedia-projects/reading/page-views-by-country/normal|table|last-month|~total|monthly

However, "The Page views by country metric is not available for all-wikipedia-projects. Select a specific wiki", whereas this KPI is available for "All wikis": https://stats.wikimedia.org/#/all-projects/reading/page-views-by-country/normal|table|last-month|~total|monthly

As Wikipedia represents more than 90% of the visits of All wikis, using "All visits" is already an excellent proxy but the exact metric on All wikipedias would be even better. For instance, there were 53M page views in Norway on All wikis in June 2020, including 26M on en.wikipedia.org which makes English the most language used to read Wikipedia in Norway.

Could it be easier to implement that?

@A455bcd9 will pageviews split by country give you what you need? while it doesn't include all possible countries (see privacy considerations), my take of it is that it can give you a good base data to work with.

Thanks to Nuria for this API reminder.

@leila No, as said in my initial message this is not enough. With this query only it's for instance impossible to quickly answer "What's the most used language in Morocco?" (French? English? Spanish? Arabic? Moroccan Arabic? Amazigh? etc.).

Hi,

Actually there's maybe something simpler and as useful: instead of displaying the Wikipedias read by country one could easily calculate the % of views represented by one Wikipedia in a country among the views on all wikipedias in this same country.

To do that, one would only need this: https://stats.wikimedia.org/#/all-wikipedia-projects/reading/page-views-by-country/normal|table|last-month|~total|monthly

However, "The Page views by country metric is not available for all-wikipedia-projects. Select a specific wiki", whereas this KPI is available for "All wikis": https://stats.wikimedia.org/#/all-projects/reading/page-views-by-country/normal|table|last-month|~total|monthly

As Wikipedia represents more than 90% of the visits of All wikis, using "All visits" is already an excellent proxy but the exact metric on All wikipedias would be even better. For instance, there were 53M page views in Norway on All wikis in June 2020, including 26M on en.wikipedia.org which makes English the most language used to read Wikipedia in Norway.

Could it be easier to implement that?

I see what you mean, but no. The data flows in a pipeline that would have to be updated. It's not that this is hard, it's just time consuming and we have too many other priorities.

Thanks for your answer. Too bad :(

If anyone arrives on this page looking for the same thing, I did a script to get what I desired: https://github.com/a455bcd9/wikilangtrends

And generated this map with it for instance: https://commons.wikimedia.org/wiki/File:Most_popular_edition_of_Wikipedia_by_country_Jan_2021.svg

Thanks very much for following through with that. Seeing your prototype makes it very clear what you need and why. I think ideally we would create a better pipeline from community-requested statistics to on-wiki infographics. This is something that's been hard for WMF to prioritize, but something I care about, and will continue to think about.

Thanks.

By the way, after I made this prototype and wrote this article about its results, I was asked to do a similar analysis by subdivisions of countries instead of just countries (for instance using ISO 3166-2 codes). This would be especially interesting for big countries with several languages like US states, Canadian provinces, Russian Republics, or Indian states. (And also for Spain, Belgium, Ukraine, Switzerland, Cameroon, Cyprus, Romania, Indonesia, Iraq, Iran, Kazakhstan, Lebanon, Latvia, Lithuania, Estonia, Morocco, Nigeria, Uzbekistan, Malaysia, Bangladesh, South Africa, etc.)

So on this map ( https://stats.wikimedia.org/#/en.wikipedia.org/reading/page-views-by-country/normal|map|2021-04-01~2021-06-01|(access)~desktop*mobile-app*mobile-web|monthly ) for example we could click on a country (maybe only on "big countries") to get the results at the lower level.

Should I create another feature request for that? Or is this idea too far-fetched?

Should I create another feature request for that? Or is this idea too far-fetched?

I don't think it's too far-fetched, and I think you should make a task. It's tricky, because our geo-location library doesn't provide this level of information, we'd have to map to it somehow and I just learned about it (thanks! :)) so I don't know if there are good standard libraries for translating lat/long to subdivisions.

Perfect, I've just created this feature request ( T284294 ).

odimitrijevic lowered the priority of this task from Medium to Low.Jan 6 2022, 3:12 AM
odimitrijevic moved this task from Datasets to Wikistats on the Analytics board.