Over the past years, we've engaged in multiple conversations with researchers and chapters interested in the use of Wikimedia traffic data to conduct epidemiological/surveillance research for public good. While we recognize the importance of this body of research, and the role data WMF collects could play in advancing it (compared to other platforms), we haven't been able to identify a sustainable model tor sharing this data. The current approach (granting server access under NDAs to a small group of researchers, under our formal collaboration policy) doesn't scale technically and organizationally.
We decided we'll convene a panel of experts asking them to produce recommendations to the organization and the movement on possible models (including costs and risks) to enable the public sharing of data Wikimedia collects (in an aggregate form) that may help make significant progress on global public health issues, while protecting the privacy of our editors and readers.
Invitees
Confirmed invitees who expressed an interest in participating:
- Daniela Paolotti and Ciro Cattuto (ISI Foundation)
- Thomas Mollet (ECDC)
Other potential invitees
Other collaborators who worked on previous proposals include:
- Shilad Sen
- Reid Priedhorsky and Geoffrey Fairchild
Tentative timeline
Q1-FY19 (Jul - Sep 2018)
References
Related projects:
- https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
- https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
- https://meta.wikimedia.org/wiki/Research:Quantifying_the_global_attention_to_public_health_threats_through_Wikipedia_pageview_data
- https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly/Identity_reconstruction_analysis
- https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly/Sanitization