Page MenuHomePhabricator

newitem.py ignores -namespace: parameter
Closed, ResolvedPublic

Description

When I updated bot after long time, there is unexpected behavior:

pwb.py newitem -namespace:0 -unconnectedpages:5000
creates wikidata item for all namespaces

how to write this comand correctly?

in pagegenerators.py is written that for some scripts must be -namespace parameter before pagegenerator, but I have it and newitem.py is not between these scripts.

Event Timeline

I can reproduce this (output: WARNING: querypage module does not support a namespace parameter) and suspect rPWBCbdeea71e129f0bc9b1a36e75388dfc576c7275ce is behind this.

Change 442910 had a related patch set uploaded (by Multichill; owner: Multichill):
[pywikibot/core@master] T196619 Revert "[cleanup] Deprecate pagegenerators.UnconnectedPageGenerator"

https://gerrit.wikimedia.org/r/442910

@Xqt : Why are you deprecating perfectly valid generators?

Buh, simple revert and rebase doesn't work.

Looks like a no-op to me.

Scratch that. It is relevant, yes, but the original implementation of using a layer of 'indirectness' is not good.

The bug is caused by the combination of pagegenerators.py#L496 and api.py#L2710:

if isinstance(self.gens[i], pywikibot.data.api.QueryGenerator):
    if self.namespaces:
        self.gens[i].set_namespace(self.namespaces)
    if self.limit:
        self.gens[i].set_maximum_items(self.limit)
else:
    if self.namespaces:
        self.gens[i] = NamespaceFilterPageGenerator(self.gens[i],
                                                    self.namespaces,
                                                    self.site)
    if self.limit:
        self.gens[i] = itertools.islice(self.gens[i], self.limit)
param = self.site._paraminfo.parameter('query+' + self.limited_module,
                                       'namespace')
if not param:
    pywikibot.warning(u'{0} module does not support a namespace '
                      'parameter'.format(self.limited_module))
    return

The caller pagegenerators.py recognizes that it is an API Query generator and therefore asks the generator itself to do the filtering, instead using our own filtering system. This is completely logical, however, querypage MediaWiki API module can't filter by namespace, and the callee api.py just ignored the request, only displaying a warning, leaving the caller pagegenerators.py no way of knowing this fault. I suggest changing this warning to an error/exception; what is asked should be done or let the caller know it's not done, and an exception seems most logical to me.

Yes, having UnconnectedPageGenerator would make api.py think it's not an API Query generator, but in reality it only contributing to code smell.

Change 443027 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Enable namespace filtering for unconnected_pages GeneratorFactory

https://gerrit.wikimedia.org/r/443027

As zhuyifei1999 statet the problems is the pagegenerators filtering. It assumes that QueryGenerators always have a namespace parameter. Either the api should do the filtering or pagegenerators should know whether the QueryGenerators is able to do it.

Change 443027 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Enable namespace filtering for unconnected_pages GeneratorFactory

https://gerrit.wikimedia.org/r/443027

Change 442910 abandoned by Xqt:
T196619 Revert "[cleanup] Deprecate pagegenerators.UnconnectedPageGenerator"

Reason:
due to https://gerrit.wikimedia.org/r/#/c/pywikibot/core/ /443027/

https://gerrit.wikimedia.org/r/442910

Thanks for fixing this guys. As an afterburner I found in T173293 that the API does filter by namespace, but isn't really supposed to do that. See example ...........

Vvjjkkii renamed this task from newitem.py ignores -namespace: parameter to ohbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Xqt as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.