Background
Administrators can use Special:Nuke (added by Extension:Nuke) to mass delete pages created on their Wikimedia project. The typical use case for this is vandalism - a user mass-created pages that need to be deleted, and this tool allows administrators to clean these pages up in a few clicks, rather than needing to delete each page individually.
When listing pages for deletion, users have a number of filtering options:
There are currently three primary options administrators have when filtering:
- Leave both the username and SQL LIKE fields blank: All recently created pages are returned.
- Add a username: All pages recently created by this specific user are returned.
- Add regex to the SQL LIKE filter: All recently created pages with titles matching this regex are returned.
They can also combine both the username and SQL LIKE filters to retrieve pages matching both.
It is currently almost impossible to know how users filtered page creations to delete them - did they use the username filter, the SQL LIKE filter, the namespace filter, and/or just target all recently created pages?
We want to know this information for two reasons. The first is performance. In response to user requests we want to both increase the length of time over which Nuke can retrieve pages for deletion (currently just 30 days; T380846), and add additional filtering options (e.g. T95797, T378488). We have found (T380846#10379277) that we are already running into performance problems when using the SQL LIKE filter, and are concerned that adding more filters will make the situation worse. This has already caused complications when attempting to increase the max age of pages to be deleted. We want to know if anyone uses the SQL LIKE filter, and if so for what reason, to see if we might be able to solve those use cases in other tools and potentially remove this filter from Nuke.
The second reason is to evaluate new features - when we add these new filtering options we would like to know how many administrators are using them so that we can evaluate their impact.
On top of measuring which filters are used, we could also use this instrumentation to directly measure Nuke's performance. By emitting another event, linked to the first by an ID, when pages are returned, we can measure how long it took to present the user with the list of pages.
Proposed implementation
Each time a user clicks 'List pages' we want to collect the values in each field of the form. We want to both know if someone used the filter, and what the value of the filter is. This will help us understand both whether a field is used, and how.
Contextual attributes
We would like to collect the following optional attributes for each event:
- mediawiki_database - so that we can evaluate differences by Wikimedia project type.
- performer_session_id - so we can parse out individual users doing many listings, if this becomes apparent.
Data collection risk tier
Low Risk (data collection activity log form submitted).
Data retention plan
Standard Metrics Platform retention.