Page MenuHomePhabricator

fixing_redirects.py conflict with wikidata usage
Closed, DuplicatePublic

Description

>>> Germania <<<
Retrieving 50 pages from wikipedia:it.
Retrieving 50 pages from wikipedia:it.
Traceback (most recent call last):
  File "pwb.py", line 239, in <module>
    if not main():
  File "pwb.py", line 233, in main
    run_python_file(filename, argv, argvu, file_package)
  File "pwb.py", line 88, in run_python_file
    main_mod.__dict__)
  File "./scripts/fixing_redirects.py", line 230, in <module>
    main()
  File "./scripts/fixing_redirects.py", line 225, in main
    workon(page)
  File "./scripts/fixing_redirects.py", line 179, in workon
    text = treat(text, page2, target)
  File "./scripts/fixing_redirects.py", line 85, in treat
    if actualLinkPage != linkedPage:
  File "/home/.../core/pywikibot/tools/__init__.py", line 139, in __ne__
    return other != self._cmpkey()
  File "/home/.../core/pywikibot/page.py", line 288, in _cmpkey
    return (self.site, self.namespace(), self.title())
  File "/home/.../core/pywikibot/page.py", line 141, in site
    return self._link.site
  File "/home/.../core/pywikibot/page.py", line 4687, in site
    self.parse()
  File "/home/.../core/pywikibot/page.py", line 4641, in parse
    u"%s contains illegal char(s) %s" % (repr(t), repr(m.group(0))))
pywikibot.exceptions.InvalidTitle: u'{{' contains illegal char(s) u'{'
<class 'pywikibot.exceptions.InvalidTitle'>

The problem is the {{ }} within [[ ]] in [[{{#property:p30}}]]. This actually involves also some weird combination such as [[{{CURRENTYEAR}}]] but Wikidata's usage is definitely the most important situation. The best solution would be simply ignoring links including illegal chars (maybe reporting them somewhere or in summary).

Event Timeline

Vituzzu raised the priority of this task from to Needs Triage.
Vituzzu updated the task description. (Show Details)
Vituzzu added a project: Pywikibot.
Vituzzu added a subscriber: Vituzzu.
Vituzzu set Security to None.

The general idea seems similar to T103080: Link should recognise {{ns:Project}} in text although this here is far more complex. If we implement this here it seems that we need to implement some default templates like {{CURRENTYEAR}} and parse them manually and replace the result. {{#property:p30}} also requires some wikibase magic and can't be implemented statically.

Dvorapa added a subscriber: Dvorapa.

Let's merge them as the general issue is basically the same.

Dvorapa triaged this task as Low priority.
Dvorapa added a project: good first task.
Dvorapa moved this task from Backlog to Doing on the good first task board.

Change 395154 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] [bugfix] Don't handle category prefixes as iw shortcuts

https://gerrit.wikimedia.org/r/395154