Page MenuHomePhabricator

ItemPage.fromPage raises an exception with a very unuseful message
Open, Needs TriagePublic

Description

import pywikibot as wp
site = wp.Site("de", "wikipedia")
page = wp.Page(site, "Kitzelbach")
# page.get()
wppage = wp.ItemPage.fromPage(page)

Executing this code raises a pywikibot.exceptions.NoPage exception with an error message of "Page [[wikidata:-1]] doesn't exist." This does not make it clear at all what the problem is (the article "Kitzelbach" on de.wp is a redirect).

Uncommenting the call to page.get raises a (much more useful) IsRedirectPage exception with an actually useful error message ("Page [[wikipedia:de:Kitzelbach]] is a redirect page." It would be great if this exception (or at least the error message) wouldn't be hidden or if fromPage made it possible to supply get_redirect like BasePage.get does.

Event Timeline

Mineo raised the priority of this task from to Needs Triage.
Mineo updated the task description. (Show Details)
Mineo added a project: Pywikibot.
Mineo subscribed.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptJan 21 2015, 7:27 PM
gerritbot subscribed.

Change 186240 had a related patch set uploaded (by Mpaa):
Raise IsRedirectPage in ItemPage.fromPage()

https://gerrit.wikimedia.org/r/186240

Patch-For-Review

Using lazy_load also fails.

>>> import pywikibot as wp
>>> site = wp.Site("de", "wikipedia")
>>> page = wp.Page(site, "Kitzelbach")
>>> wppage = wp.ItemPage.fromPage(page, lazy_load=True)
>>> wppage.get()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "pywikibot/page.py", line 3272, in get
    super(ItemPage, self).get(force=force, *args, **kwargs)
  File "pywikibot/page.py", line 2920, in get
    raise pywikibot.NoPage(self)
NoPage: Page [[wikidata:-1]] doesn't exist.

The problem is broader than just redirects.

If the page exists, but no item exists, the NoPage exception is not helpful - it just says NoPage('-1')

A large part of the problem is this the 'lazy_load' mechanism, which was previously the default.

When we are lazy loading, the raw result only says 'missing'. I was hoping for a resolution to T70251 before building better error handling. But, we may be waiting a while.

I suspect that it is possible to have a client wiki redirect page that is linked to a wikibase item. I've seen lots of mailing list discussions about wikidata linking to redirects.

If it is possible to resolve a redirect page to a wikibase item, then fromPage should raise NoPage , instead of IsRedirectPage , but the error message should be informative.

If we raise IsRedirectPage, existing bots which catch only NoPage will break.

And if/when T54564 (sitelinks to redirects) is fixed, we wont be able to raise IsRedirectPage any longer.

and T68067 might mean that a client page could be sitelinked to a wikibase item which is a redirect..?

Isn't T68067 just about redirects between Wikidata items (like Qabc redirects to Qxyz) which wouldn't matter for this bug?

And about T54564: is there really any difference between not raising NoPage anymore and not raising IsRedirectPage anymore?

No but the difference is how this bug is solved. In https://gerrit.wikimedia.org/r/#/c/186240/ it explicitly checks if the page is a redirect and throws an exception otherwise. If T54564 is solved this no longer works. And it also changes how the library reacts, as it currently doesn't matter if the page is a redirect. With that patch it would matter but only as long as redirects haven't linked to it. So a script can't rely on that exception to be thrown.

I suppose that patch can be abandoned then.
If you think the same, let me know and I'll drop it.

Change 186240 abandoned by Mpaa:
Handle IsRedirectPage in ItemPage

Reason:
Too many things happen during the life of a patch ...

https://gerrit.wikimedia.org/r/186240

Can this be closed as resolved? I know that pywikibot core now properly raises a more accurate NoPage on this.

Probably, but I don't see a button to close it.

Legoktm claimed this task.
Legoktm subscribed.

Action -> change selector to "Change Status" -> Resolved!

Legoktm set Security to None.

Ah, no, the error message is still Page [[wikidata:-1]] doesn't exist. which is not useful.