Page MenuHomePhabricator

pywikibot: delinker.py: Use "Category:Pages with missing files" and save time
Closed, ResolvedPublicFeature

Description

In Pywikibot, I used the delinker.py script. It works really well; I like how it functions, especially the way it generates the edit summary and its speed. However, I encounter errors from time to time, and it takes a long time if we run it on both Commons and a local wiki.

Why don't we save time and memory? I would like to request a feature that utilizes the Category:Pages with missing files. This is a built-in MediaWiki category that is available on all wikis and automatically contains pages with broken files.

Steps to implement:

  1. Add an option to find the category on the local wiki using Q4989282.
  2. Only check pages in the main namespace.
  3. Use the API to get a list of broken/missing files on the article page.
  4. Check the deletion log of each broken file on both Commons and the local wiki:
    • If the file was deleted and doesn’t exist, delink it.
    • If the file doesn’t have a deletion log and doesn’t exist, ignore it because it may be uploaded on another wiki (not Wikimedia Commons) but not available on the local wiki. For now, ignoring these files is the better option.

Well, that is it! This way we really save time, memory and all. Thanks!

Event Timeline

Xqt triaged this task as Medium priority.

Change #1061456 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMPR] add -category option to delinker.py

https://gerrit.wikimedia.org/r/1061456

@Xqt, WMF, and communities must be proud of you. I'm speechless—you made it! Just for the record, here are the statistics:
Command: python pwb.py delinker.py -family:wikipedia -lang:ckb -localonly -category
Results:

200 read operations
5284 skip operations
83 write operations
Execution time: 50 minutes, 21 seconds
Read operation time: 15.1 seconds
Skip operation time: 0.6 seconds
Write operation time: 36.4 seconds
Script terminated successfully.

This is a sample edit, and these are all the edits. See? It is now really, really good! No errors! Right now, I don't have more words to say, but I may reopen this ticket if I encounter any issues related to this in the future. I really appreciate it!

Change #1061456 merged by jenkins-bot:

[pywikibot/core@master] [IMPR] add -category option to delinker.py

https://gerrit.wikimedia.org/r/1061456

Change #1061952 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMRP] Look for the lastest file deletion first.

https://gerrit.wikimedia.org/r/1061952

Change #1061952 merged by jenkins-bot:

[pywikibot/core@master] [IMRP] Look for the lastest file deletion first.

https://gerrit.wikimedia.org/r/1061952