Page MenuHomePhabricator

scripts/interwikidata.py returns a InconsistentTitleError on a Malayalam title
Open, Needs TriagePublic

Description

I am running this:

python3 pwb.py scripts/interwikidata.py -lang:shi -clean -page:'Aliksandṛ Puckin' -always

On this version of the article: https://shi.wikipedia.org/w/index.php?title=Aliksand%E1%B9%9B_Puckin&oldid=27245

I am receiving this error:

>>> Aliksandṛ Puckin <<<
WARNING: API warning (query): The value passed for "titles" contains invalid or non-normalized data. Textual data should be valid, NFC-normalized Unicode without C0 control characters other than HT (\t), LF (\n), and CR (\r).

0 pages read
0 pages written
0 pages skipped
Execution time: 7 seconds
Script terminated by exception:

ERROR: InconsistentTitleError: Query on [[ml:അലക്സാണ്ടര്‍ പുഷ്കിന്‍]] returned data on 'അലക്സാണ്ടർ പുഷ്കിൻ'
Traceback (most recent call last):
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 399, in <module>
    if not main():
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 391, in main
    run_python_file(filename,
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 106, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File "./scripts/interwikidata.py", line 249, in <module>
    main()
  File "./scripts/interwikidata.py", line 243, in main
    bot.run()
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/bot.py", line 1558, in run
    self.treat(page)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/bot.py", line 1810, in treat
    self.treat_page()
  File "./scripts/interwikidata.py", line 95, in treat_page
    item = self.try_to_add()
  File "./scripts/interwikidata.py", line 177, in try_to_add
    wd_data = self.get_items()
  File "./scripts/interwikidata.py", line 164, in get_items
    if not iw_page.exists():
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/page/__init__.py", line 716, in exists
    return self.pageid > 0
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/page/__init__.py", line 261, in pageid
    self.site.loadpageinfo(self)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/site/_apisite.py", line 1107, in loadpageinfo
    self._update_page(page, query)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/site/_apisite.py", line 1084, in _update_page
    raise InconsistentTitleError(page, pageitem['title'])
pywikibot.exceptions.InconsistentTitleError: Query on [[ml:അലക്സാണ്ടര്‍ പുഷ്കിന്‍]] returned data on 'അലക്സാണ്ടർ പുഷ്കിൻ'
CRITICAL: Exiting due to uncaught exception <class 'pywikibot.exceptions.InconsistentTitleError'>

I guess that it has something to do with the value of the Malayalam string:

[[ml:അലക്സാണ്ടര്‍ പുഷ്കിന്‍]]

The string appears to be valid; for example, it works if I paste it to the Malayalam Wikipedia. However, Malayalam is a script that is known to have some encoding issues occasionally, and it's possible that I'm missing something.

pywikibot probably shouldn't crash because of it.

Tagging Wikidata because it may be related; please remove the tag if it's not.

Event Timeline

The warning is an MediaWiki-API issue.

The page title on left contains 3 chars for 'ന്‍' which is 'ന' + '്' + '\u200d' whereas on right side 'ൻ' is chr(3451). There must be a wrong title given somewhere.

Aha, so as I suspected, it probably is a Malayalam encoding issue. There was a change in how Malayalam final letters are encoded about eleven years ago. I'd expect this not to throw an error, but to be automatically compatible in MediaWiki's Unicode handling, but I'm not a true expert in the details of how that works.