On the moderator tools team we want to analyse how many readers might have seen vandalism content before it is reverted. To do this we want to count pageviews while certain revisions are visible on a page.
This will help us understand the impact that Automoderator will have - our hypothesis is that if a community uses Automoderator, fewer readers will see bad content on their project, because it will be reverted more quickly. Although we are already planning to measure the time between content being added and being reverted, being able to track pageviews would give us a much clearer sense on the impact this has on readers.
@mpopov suggested that we could record the revision ID in the X-Analytics header, and then use a list of reverted revisions to find how many pageviews loaded that now-reverted content.
Notes
- Only the app servers know the revision ID of the page that's being requested. The app servers have to propagate the information in the header
- The code that sets the X-Analytics header on the appservers lives in the WikimediaEvents extension: https://gerrit.wikimedia.org/g/mediawiki/extensions/WikimediaEvents/+/05d1de36160d047e4a19a14cd250f1082d3f3c02/includes/WikimediaEventsHooks.php#74
- The raw header is converted to a map by the refine_webrequest_hourly job with no further processing so any key-value-pair added to the header anywhere in the pipeline will appear in wmf.webrequest automagically
- The above has access to a Title object, which represents the title of the current page
- Title::getLatestRevisionID() returns the ID of the latest revision associated with the current page
- The keys and possible values of the X-Analytics header are documented here: https://wikitech.wikimedia.org/wiki/X-Analytics