It would be great to have list of page views for members of WP:MED articles (~ 35K articles). I've put up this query that actually is a modification of Special:WhatLinksHere query but it's fast and efficient.
This is a great start. Here's the short term plan and the longer term plan.
Join the results of Amir's query to pageview_hourly for a month, and write the results to another table. Publish top(1000), and total views for the whole WikiProject medicine on datasets.wikimedia.org.
Add WikiProject:Medicine as a project modifier to en.wikipedia.org queries on the pageview API. Need to think this through and how it would work on projects like commons and wikidata, but basically queries could ask for /en.wikipedia.org|WikiProject:Medicine/ and get top(1000) and per-project totals. We can get data for other languages by using Wikidata inter-language links. We can assume enwiki is authoritative for this purpose, even though it's not strictly true and we'll miss some articles with this assumption. It's better than nothing and we can improve the assumption when we have better data.
Thanks Doc, yeah I'm looking to get some preliminary numbers by the end of the quarter and then to think about productionizing access to this kind of data in Q3 (starting January next year). We just have a long backlog of work. If others are interested in this in a volunteer capacity, I'm always happy to guide them.
Queries done. The data's in the milimetric.wikiproject_medicine_page_counts Hive table. I collected the steps I took in this gist: https://gist.github.com/milimetric/e77e22a736cef4c973a26667a3e94d8c
@Ladsgroup let's chat about how this works if you want. As I say at the bottom of that gist, here's a way to get top 100 pages and their views for August:
select page_title, view_count from milimetric.wikiproject_medicine_page_counts where year=2016 and month=8 order by view_count desc limit 100 ;
Just a teaser:
Well, so this is a one-off query. With it, I ran numbers for July 2016 and August 2016, for all 33K pages in WPMED. That's what the milimetric.wikiproject_medicine_page_counts table contains right now. If more months are required, I have the steps to get that data in my scripts. For now, we won't automate this and expose it through the Pageview API, but that's on our backlog. We're thinking we can get to it early next year (January) or sooner if someone wants to volunteer to help.
The totals for all articles by month:
July 2016: 179253171
August 2016: 190445556
The table is not accessible without an NDA, so I think only Amir has access to it. It's a Hive table, accessible from stat1002.
Hi @Doc_James, yes, I only intended to run this manually once. We have plans to prioritize the long-term version of this in Q3 (starting January 2017). Until then, we're busy with editing data infrastructure.
People who have signed the NDA, which you're right stands for non-disclosure agreement, can run the same query using the details I provide here. So basically any analyst at WMF or researchers that have that access. I'm happy to help any such efforts if they get stuck using my code.