It would be great to have list of page views for members of WP:MED articles (~ 35K articles). I've put up this query that actually is a modification of Special:WhatLinksHere query but it's fast and efficient.
Description
Event Timeline
This is a great start. Here's the short term plan and the longer term plan.
Short
Join the results of Amir's query to pageview_hourly for a month, and write the results to another table. Publish top(1000), and total views for the whole WikiProject medicine on datasets.wikimedia.org.
Long
Add WikiProject:Medicine as a project modifier to en.wikipedia.org queries on the pageview API. Need to think this through and how it would work on projects like commons and wikidata, but basically queries could ask for /en.wikipedia.org|WikiProject:Medicine/ and get top(1000) and per-project totals. We can get data for other languages by using Wikidata inter-language links. We can assume enwiki is authoritative for this purpose, even though it's not strictly true and we'll miss some articles with this assumption. It's better than nothing and we can improve the assumption when we have better data.
This task will be an ad hoc query to get these numbers.
There is a longer task of "adding top counts for wiki projects to pageview API" : https://phabricator.wikimedia.org/T141010
Look forwards to the outcome. Andrew West does the top 5000 but we do not have a total for the entire project. Would also like similar totals for other languages.
Which articles pertain to medicine in other languages can be found through wikidata language links.
Thanks Doc, yeah I'm looking to get some preliminary numbers by the end of the quarter and then to think about productionizing access to this kind of data in Q3 (starting January next year). We just have a long backlog of work. If others are interested in this in a volunteer capacity, I'm always happy to guide them.
Queries done. The data's in the milimetric.wikiproject_medicine_page_counts Hive table. I collected the steps I took in this gist: https://gist.github.com/milimetric/e77e22a736cef4c973a26667a3e94d8c
@Ladsgroup let's chat about how this works if you want. As I say at the bottom of that gist, here's a way to get top 100 pages and their views for August:
select page_title, view_count from milimetric.wikiproject_medicine_page_counts where year=2016 and month=8 order by view_count desc limit 100 ;
Just a teaser:
page_title view_count
Zika_virus 771956
Leonardo_da_Vinci 740377
MDMA 493059
Tuberculosis 427626
Sexual_intercourse 402888
Project_MKUltra 374635
Trypophobia 348085
Diazepam 329726
Diabetes_mellitus 327654
Narcissistic_personality_disorder 321081
Asperger_syndrome 316896
Malaria 315221
Meningitis 301670
Lyme_disease 280941
Looking good. Does it generate pageviews for entire projects in a given month yet? About 33K pages for WPMED.
James
Well, so this is a one-off query. With it, I ran numbers for July 2016 and August 2016, for all 33K pages in WPMED. That's what the milimetric.wikiproject_medicine_page_counts table contains right now. If more months are required, I have the steps to get that data in my scripts. For now, we won't automate this and expose it through the Pageview API, but that's on our backlog. We're thinking we can get to it early next year (January) or sooner if someone wants to volunteer to help.
Per "That's what the milimetric.wikiproject_medicine_page_counts table contains right now." were do I find this table? Or can you provide the totals here?
The totals for all articles by month:
July 2016: 179253171
August 2016: 190445556
The table is not accessible without an NDA, so I think only Amir has access to it. It's a Hive table, accessible from stat1002.
Perfect thanks. Confirms that the overall decrease in readership is smallish. https://en.wikipedia.org/wiki/Template:WikiProject_Medicine/Popular_pages/Total
Hey Milimetric can we get data for Sept for WPMED? Not sure if it is avaliable yet. Also by NDA you mean "non disclosure agreement"?
I update this table as I get it https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Popular_pages
@Doc_James : i think @Milimetric was going to do this as a 1 off query but the intent was not to repeat it every month until we have a process by which we can run these queries automatically. I might be wrong on this @Milimetric can correct me if that is the case
Hi @Doc_James, yes, I only intended to run this manually once. We have plans to prioritize the long-term version of this in Q3 (starting January 2017). Until then, we're busy with editing data infrastructure.
People who have signed the NDA, which you're right stands for non-disclosure agreement, can run the same query using the details I provide here. So basically any analyst at WMF or researchers that have that access. I'm happy to help any such efforts if they get stuck using my code.