Page MenuHomePhabricator

reflinks.py skips some identical references for unknown reason
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • we have an article Экономика_Эстонии in Russian (I have also forked a copy here for testing)
  • we have several identical references named <ref name="Stat.ee"> with absolutely same content

What happens?:

  • bot doesn't remove duplicates though works through other issues in the article - diff1, diff2

What should have happened instead?:
Bot should remove all but that one content from <ref name="Stat.ee">

Event Timeline

Xqt triaged this task as Low priority.Jul 9 2021, 11:27 AM
Xqt subscribed.

The problem is the line feed inside the reference. We could use the dotall flag but I am unsure about any side effects

I agree, dotall isn't good, because now bot fixes cases when we have two different references with the same name. But are references actually different in the article I provided? For me it seems equal

I agree, dotall isn't good, because now bot fixes cases when we have two different references with the same name. But are references actually different in the article I provided? For me it seems equal

They are indeed equal. Using dotall is not trivial. Working on it.

Change 703859 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] Don't ignore identical references with newline in ref content

https://gerrit.wikimedia.org/r/703859

Change 703859 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] Don't ignore identical references with newline in ref content

https://gerrit.wikimedia.org/r/703859

Rubin16 assigned this task to Xqt.