Page MenuHomePhabricator

interwikidata fails with pywikibot.exceptions.InconsistentTitleReceived
Open, Needs TriagePublic

Description

urbanecm@LAPTOP-A3BHKQ07 (master u=) ~/unsynced/gerrit/pywikibot/core
$ python3 scripts/interwikidata.py -lang:lld -clean -start:'Bielorussia'
/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/config2.py:1064: _ConfigurationDeprecationWarning:                                                                                                         "interwiki_contents_on_disk" present in our user-config.py is no
longer a supported configuration variable and should be removed.
Please inform the maintainers if you depend on it.                                                                                                                                                                   warn('\n' + fill(DEPRECATED_VARIABLE.format(name)),
/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/config2.py:1064: _ConfigurationDeprecationWarning:                                                                                                         "use_mwparserfromhell" present in our user-config.py is no longer a
supported configuration variable and should be removed. Please inform
the maintainers if you depend on it.
  warn('\n' + fill(DEPRECATED_VARIABLE.format(name)),
Retrieving 50 pages from wikipedia:lld.

[...]

>>> Finlandia <<<                                                                                                                                                                                          [5/1466]Interwiki [[bat-smg:Soumėjė]] does not have an item                                                                                                                                                                Interwiki [[ceb:Finlandia]] does not have an item
Interwiki [[chr:ᏫᏂᎳᏂ]] does not have an item
Interwiki [[frp:Finlande]] does not have an item
WARNING: Interwiki [[gl:Finlandia - Suomi]] does not exist, skipping...
Interwiki [[gn:Hĩlandia]] does not have an item
Interwiki [[got:𐍆𐌹𐌽𐌽𐌰𐌻𐌰𐌽𐌳]] does not have an itemnot exist
Interwiki [[gu:ફીનલેંડ]] does not have an item
Interwiki [[hi:फ़िनलैण्ड]] does not have an item
Interwiki [[jv:Finlandia]] does not have an item
Interwiki [[ki:Finland]] does not have an item
Interwiki [[ku:Fînland]] does not have an item
Interwiki [[mhr:Суоми]] does not have an item
Interwiki [[mi:Hinerangi]] does not have an item
WARNING: API warning (query): The value passed for "titles" contains invalid or non-normalized data. Textual data should be valid, NFC-normalized Unicode without C0 control characters other than HT (\t), LF (\n), and CR (\r).

230 pages read
11 pages written
0 pages skipped
Execution time: 2286 seconds
Read operation time: 9.9 seconds
Write operation time: 207.8 seconds
Script terminated by exception:

ERROR: InconsistentTitleReceived: Query on [[ml:ഫിന്‍ലാന്റ്]] returned data on 'ഫിൻലാന്റ്'
Traceback (most recent call last):
  File "scripts/interwikidata.py", line 245, in <module>
    main()
  File "scripts/interwikidata.py", line 239, in main
    bot.run()
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/bot.py", line 1397, in run
    self.treat(page)
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/bot.py", line 1676, in treat
    self.treat_page()
  File "scripts/interwikidata.py", line 91, in treat_page
    item = self.try_to_add()
  File "scripts/interwikidata.py", line 173, in try_to_add
    wd_data = self.get_items()
  File "scripts/interwikidata.py", line 160, in get_items
    if not iw_page.exists():
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/page/__init__.py", line 791, in exists
    return self.pageid > 0
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/page/__init__.py", line 278, in pageid
    self.site.loadpageinfo(self)
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/site/__init__.py", line 2840, in loadpageinfo
    self._update_page(page, query)
  File "/home/urbanecm/unsynced/gerrit/pywikibot/core/pywikibot/site/__init__.py", line 2823, in _update_page
    raise InconsistentTitleReceived(page, pageitem['title'])
pywikibot.exceptions.InconsistentTitleReceived: Query on [[ml:ഫിന്‍ലാന്റ്]] returned data on 'ഫിൻലാന്റ്'
CRITICAL: Exiting due to uncaught exception <class 'pywikibot.exceptions.InconsistentTitleReceived'>
urbanecm@LAPTOP-A3BHKQ07 (master u=) ~/unsynced/gerrit/pywikibot/core

Event Timeline

WARNING: API warning (query): The value passed for "titles" contains invalid or non-normalized data. Textual data should be valid, NFC-normalized Unicode without C0 control characters other than HT (\t), LF (\n), and CR (\r).

So when interwiki links are being collected, some characters are not stripped (and should be). Seems to have nothing to do with Wikidata, hence not adding Pywikibot-Wikidata.