Page MenuHomePhabricator

Data Request: Aggregated-to-country-code traffic data for different language versions (no IP addresses needed)
Open, MediumPublic

Description

Author: hanteng

Description:
As advised by Erik Zachte, two researchers from the Oxford Internet Institute request the aggregated-to-country-code (and also finer aggregated-to-longitude/latitude-point, if possible) traffic data for all available language versions. The data will be used for improved mapping for Wikimedia Traffic Analysis Report, which shall benefit the public understanding of Wikimedia's multilingual development. The resulted maps will be released in copyleft license.

  • Existing tables provided by Erik Zachte's Wikimedia Traffic Analysis Report

Very interesting data presentation to show how different languages are accessed across different regions (based on the geoIP categorization).
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm

  • Proposed enhancement by our pilot mapping

To show how each language version is accessed across the world on the geographic map, which needs more detailed data, especially for those languages with sizable traffic from the "other" categories in the table shown at the link above.

  • Two researchers

Dr. Mark Graham does work on analysing patterns in Wikipedia (e.g. http://www.oii.ox.ac.uk/research/projects/?id=66). Mr. Han-Teng Liao has been using the MaxMind geoIP database for mapping the proportional difference between Baidu Baike and Chinese Wikipedia's external/citation links here: http://people.oii.ox.ac.uk/hanteng/2011/09/04/difference-in-proportional-emphasis-baidu-baike-and-chinese-wikipedia-comparison/.)

  • Expected outcome

Published maps released in copy-left license to be stored at Wikicommons. Potential academic articles and blogs on the language phenomenon in multilingual
Wikipedia project.

  • Researchers' sensitivity to privacy concerns and capacity in modern cartography.

Both researchers realise the sensitivity of IP data, and in no way want to violate user's expectations of privacy (Han-Teng is especially sensitive to this issue have been involved with Human Rights groups for the Internet industry in DC). This is why we don't want to see any IPs, but would very much like to work with aggregated data to the level of country codes at the very least, or the data aggregated to the level of city or even the longitude and latitude points (Dr Mark Graham is an trained geographer with expertise in mapping both offline and online data, as shown in http://www.floatingsheep.org/ ).


Version: unspecified
Severity: normal

Details

Reference
bz30848

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:52 PM
bzimport set Reference to bz30848.
bzimport added a subscriber: Unknown Object (MLST).

Hi Hanteng,
Do you need help getting some more traction on this?

[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]