Page MenuHomePhabricator

missing _page_id attribute when loading a page
Closed, ResolvedPublicBUG REPORT

Description

Steps to reproduce

$ python pwb.py interwiki -lang:cs -family:wikipedia -simulate
Which page to check: Wikipedie:Pískoviště
Retrieving 1 pages from wikipedia:cs.
WARNING: /home/pavel/pywikibot/pywikibot/page.py:6011: UserWarning: Site wikipedia:be-tarask instantiated using different code "be-x-old"
  link._site = pywikibot.Site(lang, source.family.name)

[[cs:Wikipedie:Pískoviště]]: [[cs:Wikipedie:Pískoviště]] gives new interwiki [[ab:Авикипедиа:Sandbox]]
[[cs:Wikipedie:Pískoviště]]: [[cs:Wikipedie:Pískoviště]] gives new interwiki [[ace:Wikipedia:Sandbox]]

...

few times y/n for different namespaces

...

NOTE: [[uk:Вікіпедія:Грамайданчик]] is redirect to [[uk:Вікіпедія:Пісочниця]]
Retrieving 2 pages from wikipedia:el.
NOTE: [[el:Βικιπαίδεια:Αμμοδοχείο]] is redirect to [[el:Βοήθεια:Πρόχειρο]]
Dump cs (wikipedia) appended.
Traceback (most recent call last):
  File "pwb.py", line 250, in <module>
    if not main():
  File "pwb.py", line 243, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 95, in run_python_file
    main_mod.__dict__)
  File "./scripts/interwiki.py", line 2577, in <module>
    main()
  File "./scripts/interwiki.py", line 2553, in main
    bot.run()
  File "./scripts/interwiki.py", line 2265, in run
    self.queryStep()
  File "./scripts/interwiki.py", line 2239, in queryStep
    self.oneQuery()
  File "./scripts/interwiki.py", line 2234, in oneQuery
    subject.batchLoaded(self)
  File "./scripts/interwiki.py", line 1248, in batchLoaded
    if not page.exists():
  File "/home/pavel/pywikibot/pywikibot/page.py", line 804, in exists
    return self.pageid > 0
  File "/home/pavel/pywikibot/pywikibot/page.py", line 286, in pageid
    return self._pageid
AttributeError: 'Page' object has no attribute '_pageid'
CRITICAL: Exiting due to uncaught exception <class 'AttributeError'>

The same without the -simulate param

Event Timeline

Xqt triaged this task as High priority.May 14 2019, 10:05 AM
Xqt changed the subtype of this task from "Task" to "Bug Report".
Xqt added a subscriber: Xqt.

Don't think that this is interwiki.py related. The _page_id attribute is just missing due to unknown reason.

Xqt renamed this task from interwiki.py throws an error after few seconds to missing _page_id attribute when loading a page.Nov 4 2020, 8:09 AM

Just a note that this bug is still current. I encountered the same error today.

python pwb.py commonscat -site:wikibooks:ar -page:"الصفحة الرئيسية"

Output:

Script terminated by exception:

ERROR: AttributeError: 'Page' object has no attribute '_pageid'
Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pwb.py", line 365, in <module>
    if not main():
  File "C:\Users\Mohammed\Downloads\core\pwb.py", line 357, in main
    run_python_file(filename,
  File "C:\Users\Mohammed\Downloads\core\pwb.py", line 73, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File ".\scripts\commonscat.py", line 557, in <module>
    main()
  File ".\scripts\commonscat.py", line 551, in main
    bot.run()
  File "C:\Users\Mohammed\Downloads\core\pywikibot\bot.py", line 1533, in run
    self.treat(page)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\bot.py", line 1814, in treat
    self.treat_page()
  File ".\scripts\commonscat.py", line 318, in treat_page
    commonscatLink = self.find_commons_category(page)
  File ".\scripts\commonscat.py", line 412, in find_commons_category
    return self.findCommonscatLink(page)
  File ".\scripts\commonscat.py", line 377, in findCommonscatLink
    if (not ipage.exists() or ipage.isRedirectPage()
  File "C:\Users\Mohammed\Downloads\core\pywikibot\page\__init__.py", line 710, in exists
    return self.pageid > 0
  File "C:\Users\Mohammed\Downloads\core\pywikibot\page\__init__.py", line 259, in pageid
    return self._pageid
AttributeError: 'Page' object has no attribute '_pageid'
CRITICAL: Exiting due to uncaught exception <class 'AttributeError'>

Output of version.py:

Pywikibot: [https] r-pywikibot-core (cb47340, g14832, 2021/05/13, 15:56:16, OUTDATED)
Release version: 6.2.0.dev0
requests version: 2.25.1
Python: 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]

Notes:

The _pageid is already known by the Page object in ist __dict__ but the attribute is not set obviously:

C:\pwb\GIT\core>pwb commonscat -site:wikibooks:ar -page:"الصفحة الرئيسية"
Retrieving 1 pages from wikibooks:ar.
dict_keys(['_link', '_revisions', '_pageid', '_contentmodel', '_isredir', '_timestamp', '_revid', '_pageprops'])
_pageid 4


>>> الصفحة الرئيسية <<<
dict_keys(['_link', '_revisions', '_pageid', '_contentmodel', '_isredir', '_timestamp', '_revid', '_pageprops', '_templates', '_raw_extracted_templates'])
_pageid 4
dict_keys(['_link', '_revisions'])

0 pages read
0 pages written
0 pages skipped
Execution time: 2 seconds
Script terminated by exception:

ERROR: KeyError: '_pageid'

The exception is raised by a interwiki link without a Page title:

>>> pageid [[ar:الصفحة الرئيسية]]
dict_keys(['_link', '_revisions', '_pageid', '_contentmodel', '_isredir', '_timestamp', '_revid', '_pageprops', '_templates', '_raw_extracted_templates'])
_pageid 4
<<< pageid [[ar:الصفحة الرئيسية]]
>>> pageid [[af:]]
dict_keys(['_link', '_revisions'])

0 pages read
0

Thank you, @Xqt for your comment.
Could you, please, make the bot skip the page (and not crash) when it encounters such exceptions?
As you know, this will save me a lot of time when running the bot on a whole namespace instead of having to restart the bot manually every time it crashes.

The exception is raised by a interwiki link without a Page title:

>>> pageid [[ar:الصفحة الرئيسية]]
dict_keys(['_link', '_revisions', '_pageid', '_contentmodel', '_isredir', '_timestamp', '_revid', '_pageprops', '_templates', '_raw_extracted_templates'])
_pageid 4
<<< pageid [[ar:الصفحة الرئيسية]]
>>> pageid [[af:]]
dict_keys(['_link', '_revisions'])

0 pages read
0

I am not an expert Python developer but I can see that the Main Page on afwikibooks has the text:

[[ar:]]
[[cs:]]
[[da:]]
[[de:]]

So, the afwikibooks Main Page does not load the link to arwikibooks from Wikidata.
Instead, it uses the above empty interwiki link. (Maybe this is why the interwiki link is without a Page title?)
Having an empty interwiki link like the above is still a custom in some wikis, so, I want to ask you what is your recommendation here?
Should we ask afwikibooks admins to remove the empty interwiki links from the afwikibooks Main Page?

Change 691149 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMPR] pagelanglinks() may skip links with empty titles.

https://gerrit.wikimedia.org/r/691149

So, the afwikibooks Main Page does not load the link to arwikibooks from Wikidata.
Instead, it uses the above empty interwiki link. (Maybe this is why the interwiki link is without a Page title?)
Having an empty interwiki link like the above is still a custom in some wikis, so, I want to ask you what is your recommendation here?
Should we ask afwikibooks admins to remove the empty interwiki links from the afwikibooks Main Page?

The patch ignores langlinks from Site.pagelanglinks() if langlinks titles are empty except they are explicitly wanted.
Also an exception i raised if pageid is missing or a Page is greated with an empty Link tiltle.
I am not sure whether this also solves the initial task description but gives a better exception including page title with link Information.

Change 691149 merged by jenkins-bot:

[pywikibot/core@master] [IMPR] pagelanglinks() may skip links with empty titles.

https://gerrit.wikimedia.org/r/691149

Xqt claimed this task.

$ python pwb.py interwiki -lang:cs -family:wikipedia -simulate works for me. Therefore i close this task