Page MenuHomePhabricator

reflinks.py work with quotes
Closed, ResolvedPublicBUG REPORT

Description

The script recognises two types of quotes as different names for the references. For example, "autogenerated1" and 'autogenerated1' are considered as two different references, though they are the same for the MediaWiki engine.

As a result, it first places "refname" (for example, first edit), then (sometimes) makes an edit of just replacing one type of quotes with another (for example, this one or this one).

I wouldn't consider it as an important issue but it also breaks the article when places "autogenerated1" with one link and 'autogenerated1' with a different one. It took me some time to catch this bug.

For example, please, have a look to this diff and especially to "автоссылка7" (name for "autogenerated7" in Russian). The article had inside:

<ref name="автоссылка7">[https://seekingalpha.com/article/4160521-china-demographic-crisis-economic-outlook]</ref>

During the edit, the bot creates a new refname:

<ref name="автоссылка7">[https://www.sciencedirect.com/science/article/pii/S0304393219300261 Aging and deflation from a fiscal perspective - ScienceDirect<!-- Заголовок добавлен ботом -->]</ref>

At the same time, it changes the quotes in the existing "автоссылка7" to 'автоссылка7' and everything is broken.

I was searching through the code, where the quotes' behaviour is defined but haven't found it yet.

Event Timeline

Rubin16 triaged this task as Medium priority.Mar 21 2021, 11:50 AM

Change 674613 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Always use double quotes with references

https://gerrit.wikimedia.org/r/674613

@Xqt thank you, it seems to be editing correctly now.

But it still makes some mistakes, creating "autogenerated7" while we had 'autogenerated7'.

So, it works better now but the problem isn't fully solved and I have no other idea about it, yet.

But this patch should be merged anyway, thanks again

@Xqt thank you, it seems to be editing correctly now.

But it still makes some mistakes, creating "autogenerated7" while we had 'autogenerated7'.

Could you also check the new patch.

Found that bug which was there since 2008! Patch comming soon.

Seems to have the same result

Found that bug and solved it with the last commit. Would you please review it.

@Xqt
seems to be working, thanks a lot. Let's merge and I will do a more continuous run to test it more.

PS: Just for my self-education (you can ignore it, if you want):

  • I see that you have changed the logic from adding +1 to the existing ref names getting a free one in the range 1-999, right? why do you believe it is better?
  • what was the bug? just curious

@Xqt
seems to be working, thanks a lot. Let's merge and I will do a more continuous run to test it more.

PS: Just for my self-education (you can ignore it, if you want):

  • I see that you have changed the logic from adding +1 to the existing ref names getting a free one in the range 1-999, right? why do you believe it is better?
  • what was the bug? just curious

The old implementation assumes that there auto generated identifiers starting with 1 e. g autogen1 autogen2, autogen3. reflinks starts counting with 4 then. In your example this fails because autogen7 was already there but it was created anyway. The new implementation looks for any auto generated number and excludes them from reusing.

Xqt claimed this task.

Change 674613 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Avoid dupliate reference names

https://gerrit.wikimedia.org/r/674613