What we need as a metric is:
- proportion of uploads via UW that have a deletion request containing the file created within 30 days of upload, by month.
- proportion of uploads via UW that have a deletion request that mentions copyright containing the file created within 30 days of upload, by month
- MVP would just be a percentage calculated at the end of the following month (so we'd have March's data at the end of April) that's easily available to the team
I'd suggest we do this by running the extract_deletion_requests.py script, plus another script to process the extracted DR data plus extra data from the data lake, via a cronjob. The scripts should send an alert to sd-alerts@lists.wikimedia.org with the number we want
We'll need to do an initial generation of data over at least the last year, so we can compare year-on-year variation by month as well as month-to-month variation
Once that's up and running we can then (in another ticket) build on it to gather other data (like post-30-days DRs, comparisons with other upload methods, etc)
Relevant tickets
Note that DR ratio baselines without the copyright reason was calculated in https://phabricator.wikimedia.org/T337466
Reasons for deletion requests were calculated in https://phabricator.wikimedia.org/T340546
Baseline data gathering T349380