Create a way to fetch the number of views to articles that embed files that were uploaded during the event, and display them as 'average daily views'.
The query should:
- Fetch all files that were uploaded to Commons and to the local wikis specified in the event setting (the system currently only considers uploads to commons; this new method needs to add the ability to fetch the list of files uploaded to the local wikis too)
- There is a task to track uploads from the individual wiki (expanding the collection from Commons only) at task T206819: Create a method to fetch information about uploaded files in local wikis
- That task is slightly broader (to enable using all metrics about uploads to show local wiki as well) but depending on your implementation approach, may need to be done before this current task.
- For each of those uploaded files, get the articles they are used/embedded in. Article list should be unique (even if two files were uploaded to the same article, the article should be counted once)
- For each of those articles, fetch the monthly page views.
- For articles that are older than 30 days, divide the monthly page views by 30.
- For articles that are "younger" than 30 days (were created less than 30 days ago) -- divide the monthly number by the number of days the article exists.
- Add all individual average views to one big sum average view, and return it
Organizers, their sponsors and partners want to understand the impact of their work. One main way to do this for files uploaded is to see the number of pageviews those files get on the various article pages to which they are added. This figure will be reported in the Event Summary reports (T205561 and T206692 ); it will also be used in the to-be-defined Files Uploaded report.
In our discussions, it has become clear that we can't get an accurate cumulative pageviews figure (see Problems and Alternative Approaches, below). So instead, we will be providing a figure for "average daily pageviews".
Definition and parameters
- All filetypes: The figure will track images, video files, audio files and other upload types.
- Uploads to Commons and local Wikipedias: Previous stats have tracked uploads to Commons only, but it is not unusual for users to upload directly to a Wikipedia. So we will track uploads to all wikis specified for the event and include those in the metric.
- Pageviews on all wikis (not just those specified): The hypothesis here is that over time, images uploaded will spread to more and more articles and, as articles are translated, more and more wikis. We want to gauge the full impact of the upload, so it would be antithetical to our purpose to count pageviews only on the wikis specified for the event.
- Method To make this metric as valid as possible by smoothing out daily or weekly fluctuations, I propose we do the following:
- Looking at the most recent day available, find the articles—on all wikis—on which the images from the event have been placed,
- Get the pageview count for all those articles over the past 30 days (it's OK that not all the images will have actually been on all those pages during that entire period).
- Average that 30 day figure and express as a daily average.
- If page creation date is < 30 days ago, use the number of available days and average by that number.
Problems and Alternative Approaches
- What date was the image added to a page? We are able to provide a figure for the number of "Pages with uploaded files." So far so good. To have an accurate picture of how many pageviews the image has received, however, we need to know what date the image was added to each page it is on. Apparently, this date is not recorded.
- The data is in the DataLake, but... The problem mentioned above would be irrelevant if we could simply get a count for how many times the image was requested. This number is is apparently available in a stream called "mediacounts that's in the Data Lake. But there is no easy way for us to get the information out of the Lake and into Tool Forge at scale. An API for this is planned, possibly for some time in the next year T88775.