Page MenuHomePhabricator

filter namespace generator should not crash on unknown namespace numbers
Open, LowPublic

Description

Filing upstream the origin cause of T110386:

We have a pagegenerators.NamespaceFilterPageGenerator with a namespace that does not exist on that site. pywikibot crashes with KeyError.

  File "/data/project/heritage/erfgoedbot/update_database.py", line 503, in processCountry
    for page in generator:
  File "/data/project/heritage/pywikibot/pywikibot/pagegenerators.py", line 1579, in PreloadingGenerator
    for page in generator:
  File "/data/project/heritage/pywikibot/pywikibot/pagegenerators.py", line 1196, in NamespaceFilterPageGenerator
    pywikibot.Site().namespaces)
  File "/data/project/heritage/pywikibot/pywikibot/site.py", line 489, in resolve
    if ns is None]))
KeyError: u'Namespace identifier(s) not recognised: 104'

Event Timeline

JeanFred raised the priority of this task from to Needs Triage.
JeanFred updated the task description. (Show Details)
JeanFred added a project: Pywikibot.
JeanFred added subscribers: JeanFred, Multichill.

Which core version are you using?

Which core version are you using?

HEAD of branch 2.0

What would be the preferred response?

jayvdb claimed this task.
jayvdb subscribed.

NamespaceFilterPageGenerator docstring says KeyError will be raised.
https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#pywikibot.pagegenerators.NamespaceFilterPageGenerator

Reopen if you have a better solution.

Namespaces get added and deleted all the time. This is a filter, it shouldn't crash when trying to filter out a namespace that isn't being supplied in the first place. Solution would be to not remove the namespaces (ever), but mark them as deprecated and only throw a deprecation warning when these are encountered.

The namespace isn't added in core but read from the API.

Worth noting that this KeyError is a breaking change from compat, so this isnt an unreasonable request.

compat raised ValueError when one of the namespaces was a string and it was not a valid namespace. However no exception was raised when an integer was provided but wasnt a valid namespace.

The new core interface (21a6732e) always raises KeyError for an invalid namespace, irrespective of whether it was a string or integer.

This interface is now implemented across the codebase in a very consistent manner, and it isnt simple to provide a different interface for NamespaceFilterPageGenerator.

Irrespective of whether NamespaceFilterPageGenerator is modified, a permissive namespace lookup method in NamespaceDict seems like it will be useful. It would accept a list of namespace identifiers and returned a filtered list of the namespaces that are valid on the site.

Also possible: the KeyError raised by NamespaceDict.resolve could be a custom KeyError subclass, which contains two attributes: valid and invalid namespaces, so the caller can decide what needs to be done. NamespaceFilterPageGenerator could then proceed with only the valid namespaces, after issuing warnings for the invalid namespaces.

Thanks for the comment Jay, that seems a good direction.

Xqt triaged this task as Low priority.Nov 13 2018, 8:24 PM
Xqt removed jayvdb as the assignee of this task.Feb 19 2020, 6:59 AM