Page MenuHomePhabricator

pywikibot delinker script: support removal of files not enclosed in double brackets
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

The delinker script currently only removes files wrapped in double brackets, like [[file:title.extension]]. However, many files are not in this format and appear as values in template parameters, gallery tags, and elsewhere. These files should also be removed. Since we already have the full title of the deleted files from the deletion log, implementing this process would be relatively simple.

I noticed some more issues along with the main title. Here:

  • Run this on ckbwiki: python pwb.py delinker -family:wikipedia -lang:ckb -category

What happens?:
You should get these lines, but files in these two pages are not wrapped in double brackets.
First page:

Reading settings from scripts.ini file.
WARNING: "since" is not a valid option. It was ignored.
.
>>> Delinking پەڕگە:2001 A Space Odyssey (1968) theatrical poster variant.jpg <<<


>>> ٢٠٠١: ئۆدێسەی بۆشایی ئاسمان <<<
No changes were needed on [[ckb:٢٠٠١: ئۆدێسەی بۆشایی ئاسمان]]

Next page:

>>> Delinking پەڕگە:Almaty 13.jpg <<<


>>> ئالماتی <<<
No changes were needed on [[ckb:ئالماتی]]

What should have happened instead?:

  • The message WARNING: "since" is not a valid option. It was ignored. should not appear since we did not provide the -since option.
  • The script initially shows >>> Delinking پەڕگە:2001 A Space Odyssey (1968) theatrical poster variant.jpg <<< and then displays the page title >>> ٢٠٠١: ئۆدێسەی بۆشایی ئاسمان <<<. This causes confusion as it's unclear which file corresponds to which page. The page title should appear first, followed by the delinking messages. Explained here: T388851#11399212
  • The script shows No changes were needed on [[ckb:٢٠٠١: ئۆدێسەی بۆشایی ئاسمان]] without removing the file link even though the file was deleted locally. Since we have got the full file name, the script should remove it unconditionally.
  • The same issue occurs with the next page, where Almaty 13.jpg was deleted on Wikimedia Commons and is called in <gallery> tags. It was expected to be removed, but the script did not remove it.

Software version
Release version: 9.6.2

Event Timeline

Change #1207187 had a related patch set uploaded (by Dumbledore; author: Dumbledore):

[pywikibot/core@master] Fix delinker: Support file removal beyond [[File:]] syntax

https://gerrit.wikimedia.org/r/1207187

hi @Aram,

I have submitted a patch in Gerrit that addresses this task.
Feedback and review would be appreciated. Thank you!

Hi @Dumbledore,

Thank you for working on this. I updated delinker.py with your patch and tested it locally. The results are mixed:

Fixed:

  • The WARNING: "since" is not a valid option. It was ignored. no longer appears when omitting the -since argument.
  • The page title is now correctly displayed before the filenames.

Remaining Issues:

  • The bot still fails to remove the deleted files, just as it did before.

Here are some sample outputs from the test:

The file called in an infobox image parameter:

>>> ٢٠٠١: ئۆدێسەی بۆشایی ئاسمان <<<

>>> ٢٠٠١: ئۆدێسەی بۆشایی ئاسمان <<<
    Delinking پەڕگە:2001 A Space Odyssey (1968) theatrical poster variant.jpg
    No changes were needed
No changes were needed on [[ckb:٢٠٠١: ئۆدێسەی بۆشایی ئاسمان]]

The file called in a <gallery> tag:

>>> ئالماتی <<<

>>> ئالماتی <<<
    Delinking پەڕگە:Almaty 13.jpg
    No changes were needed
No changes were needed on [[ckb:ئالماتی]]

Regarding the output logs:
By default, Pywikibot already displays the page titles, No changes were needed on [[langcode:Page title]] and success messages. The custom print lines introduced in the patch create redundancy, as shown in the logs above. We should probably remove them.

Apologies for the delay in getting back to you.

Hi @Aram,

I have updated the patch. Kindly check and let me know.

@Dumbledore, unfortunately, this update introduced regressions.

Current output:

>>> ٢٠٠١: ئۆدێسەی بۆشایی ئاسمان <<<
    No changes were needed
No changes were needed on [[ckb:٢٠٠١: ئۆدێسەی بۆشایی ئاسمان]]

>>> ئالماتی <<<
    No changes were needed
No changes were needed on [[ckb:ئالماتی]]

Issues:

  • Missing Log: The Delinking [filename]... line was removed but is required.
  • Redundant Code: Duplicate titles are fixed, but this check is unnecessary and should be removed:
if text != original_text:
    pywikibot.info('    Changes made')
else:
    pywikibot.info('    No changes were needed')
  • Core Failure: The bot still fails to remove the files.

Hi @Aram,

I have updated the patch. Kindly check and let me know.

Hi @Xqt and @Dumbledore, I have fixed and verified the fixes locally. I believe this task is now resolved and ready for final review.

Please check the attached file to see if there is anything wrong or unclear.

The updated script now correctly handles:

  1. Whole-line removals: Safely removes entries in <gallery> tags.
  2. Bare filenames: Detects and removes filenames used as plain text in inline template parameters or other text.
  3. The since option warning when running in category mode (fixed by @Dumbledore).

Regarding my previous comment about the log output order: I realized I was wrong. The current output format is actually correct because the script works by grabbing a deleted file first, and then iterating through the pages that use it. So displaying the "Delinking File..." message before the page titles is the logical behavior.

Thanks!

Hi @Aram

Do I need to change anything in delinker.py file or kindly let me know if any changes required.

@Dumbledore Based on my tests , the attached file is the correct final version.
Just please change # This safely removes entries in <gallery> tags, lists, and templates. with # This safely removes entries in <gallery> tags.

Note that we do not parse <gallery> tags explicitly; instead, we target the entry syntax (lines starting with the filename), which effectively handles gallery cleanup.

Thanks!

@Aram So shall I replace this file with the current file and push it to Gerrit? Is that fine?

@Dumbledore, yes, just replace the current master file with the attached file here. Plus, change that comment line. And if i noticed something wrong, I will tell you. Thanks!

Change #1207187 abandoned by Dumbledore:

[pywikibot/core@master] Fix delinker: Support file removal beyond [[File:]] syntax

https://gerrit.wikimedia.org/r/1207187