As a product manager, I want to know whether image suggestions notifications (T292142) lead to the addition of media on the suggested articles, so that I can track whether the feature is a success.
Acceptance Criteria:
Create a dashboard that shows the following, per wiki, updated monthly:
- Number of notifications sent
- Revert rate for image additions
- For all image additions
- Filtered by experienced users with over 500 edits
- Filtered additions made after opening one of our notifications
- Percentage of users who make image edits
- For all image additions
- Filtered by experienced users with over 500 edits
- Filtered by additions made after opening one of our notifications
- Total users who make image edits
- For all image additions
- Filtered by experienced users with over 500 edits
- Filtered by additions made after opening one of our notifications
- Percentage of notifications read
- For all notifications
- Filtered by image suggestions notifications only (so we can compare the overall engagement rate)
- Images added
- Total images added
- Filtered by experienced editors with over 500 edits only
- Filtered by users who opened an image suggestions notification
- Notification opt out rate
- Media added to infoboxes are included in the total
- All available media types are included in the total (video, audio, images, pdfs, etc)
- Icons and other unwanted media types are filtered out of the total
- Additions that have been reverted within 48 hours are filtered out of the total
- Data is kept permanently so the process can be revised
Technical Approach Notes:
- Using list of sent notifications, match all the revisions of users on these pages.
- Parse disks on cluster offline and use code that extracts whether media was added in a given diff.
- First gather all edits to pages for which you sent notifications associated with the right user and see if there’s evidence of image based activity on the disks.
- To include media in infoboxes, by default it won't catch it unless there’s a link, so write something more specific that looks for .jpeg and .png and other media types getting added to wikitext anywhere, which could be a lower cost way.
- Use this tool to test and have example revisions: https://wiki-topic.toolforge.org/diff-tagging
- Extract the image stuff out of this. Run a regex for wikitext to look for links that start with file, media, image, and any other aliases, and one that looks for .jpg or .png or other known media extensions and do that for other diffs and compare them. Simpler regex plus compare.
- Would be an ad hoc run of fetching all of the revisions in the latest x timespan and parsing that revision and the previous one. We have the history dumps on the cluster so you would identify potential edits with user+page and then parse them to discover what was done.
Dashboard in Superset: Image Suggestion Notification Dashboard