Page MenuHomePhabricator

Bot errors out if $1 placeholder is in the middle of a page title
Open, Needs TriagePublic3 Estimated Story Points

Description

What happened?

Commons Deletion Notification bot attempted to process a page with $1 in the title, and Pywikibot errored out. This continued relentlessly for over 18 hours until I intervened.

Backtrace:

  File "/data/project/commtech-commons/bot/virtualenv/lib/python3.7/site-packages/pywikibot/site/_extensions.py", line 221, in globalusage
    gu_site = pywikibot.Site(url=entry['url'])
  File "/data/project/commtech-commons/bot/virtualenv/lib/python3.7/site-packages/pywikibot/__init__.py", line 1184, in Site
    code, fam = _code_fam_from_url(url, fam)
  File "/data/project/commtech-commons/bot/virtualenv/lib/python3.7/site-packages/pywikibot/__init__.py", line 1097, in _code_fam_from_url
    code = family.from_url(url)
  File "/data/project/commtech-commons/bot/virtualenv/lib/python3.7/site-packages/pywikibot/family.py", line 856, in from_url
    'not supported (T111513).'.format(url, suffix))
ValueError: Url: https://rw.wikipedia.org/wiki/Perezida_Kagame_yatangije_ikigega_cya_miliyoni_$100_kizatera_inkunga_imishinga_irengera_ibidukikije
Text 00_kizatera_inkunga_imishinga_irengera_ibidukikije after the $1 placeholder is not supported (T111513).

Solution

This was fixed by simply restarting the bot, but we should put in some safeguards to prevent this from happening again.

Some semi-related tickets: T111513, T298078

The issue apparently needs fixing in Pywikibot itself, and it may actually have already been fixed. In particular, https://gerrit.wikimedia.org/r/c/pywikibot/core/+/749160/ implies Pywikibot added support for $1 in the middle of a page title, but in our case it's not a placeholder but an actual part of the page title. So, caution should be used if we update Pywikibot so that ensure page titles like $1,000,000 Worth of Twang are handled correctly.

Acceptance criteria

  • If the bot needs to process a page title with $1 in the title, it doesn't error out or incorrectly treat it as a placeholder.