We need an up-to-date view of commons deletion requests, so we can keep an eye on how DRs are affected by changes we make to UploadWizard
Proposed solution
- dedicated hive db
- use (daily) data in wmf_dumps to update a deletion request table containing
- timestamp for when the deletion request was created
- timestamp for when the deletion request was resolved
- whether the file was (or files were) kept or deleted
- the reason the request was opened
- the reason the request was closed
- page_titles for files that are covered by the DR
Also an up-to-date upload table that's a view of page where page_namespace=6 plus image plus filearchive, that can be linked to the deletion requests, with fields
- page_id
- page_title
- upload time
- upload source (UW, cross-wiki, etc)
- deleted time
- deletion comment
Everything below is out of scope but just want to write it all down here so we'll have it for again ... ultimately it'd be great to also have
- uploader edit count at time of upload?
- uploader groups at time of upload?
- uploader account age at time of upload?
... and then maybe a file_tags table ("logo", "ownwork/notown", "has external source", etc) joined to the upload table
... and also maybe a deletion_tags table ("logo", "copvio", "FoP", etc) joined to the upload table
... probably also a table containing pHashes
... and maybe a table containing pHashes of files on wikipedias, as they're likely to be copyrighted
If we had all the above it'd really help us begin to figure out likelihood-of-deletion scores for uploads