Page MenuHomePhabricator

newitem.py ignores -namespace: parameter
Closed, ResolvedPublic

Description

When I updated bot after long time, there is unexpected behavior:

pwb.py newitem -namespace:0 -unconnectedpages:5000
creates wikidata item for all namespaces

how to write this comand correctly?

in pagegenerators.py is written that for some scripts must be -namespace parameter before pagegenerator, but I have it and newitem.py is not between these scripts.

Event Timeline

JAnD created this task.Jun 7 2018, 9:18 AM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJun 7 2018, 9:18 AM
JAnD updated the task description. (Show Details)Jun 7 2018, 9:19 AM

I can reproduce this (output: WARNING: querypage module does not support a namespace parameter) and suspect rPWBCbdeea71e129f0bc9b1a36e75388dfc576c7275ce is behind this.

matej_suchanek triaged this task as High priority.Jun 21 2018, 4:04 PM

Change 442910 had a related patch set uploaded (by Multichill; owner: Multichill):
[pywikibot/core@master] T196619 Revert "[cleanup] Deprecate pagegenerators.UnconnectedPageGenerator"

https://gerrit.wikimedia.org/r/442910

@Xqt : Why are you deprecating perfectly valid generators?

Buh, simple revert and rebase doesn't work.

Looks like a no-op to me.

Scratch that. It is relevant, yes, but the original implementation of using a layer of 'indirectness' is not good.

The bug is caused by the combination of pagegenerators.py#L496 and api.py#L2710:

if isinstance(self.gens[i], pywikibot.data.api.QueryGenerator):
    if self.namespaces:
        self.gens[i].set_namespace(self.namespaces)
    if self.limit:
        self.gens[i].set_maximum_items(self.limit)
else:
    if self.namespaces:
        self.gens[i] = NamespaceFilterPageGenerator(self.gens[i],
                                                    self.namespaces,
                                                    self.site)
    if self.limit:
        self.gens[i] = itertools.islice(self.gens[i], self.limit)
param = self.site._paraminfo.parameter('query+' + self.limited_module,
                                       'namespace')
if not param:
    pywikibot.warning(u'{0} module does not support a namespace '
                      'parameter'.format(self.limited_module))
    return

The caller pagegenerators.py recognizes that it is an API Query generator and therefore asks the generator itself to do the filtering, instead using our own filtering system. This is completely logical, however, querypage MediaWiki API module can't filter by namespace, and the callee api.py just ignored the request, only displaying a warning, leaving the caller pagegenerators.py no way of knowing this fault. I suggest changing this warning to an error/exception; what is asked should be done or let the caller know it's not done, and an exception seems most logical to me.

Yes, having UnconnectedPageGenerator would make api.py think it's not an API Query generator, but in reality it only contributing to code smell.

Masti added a subscriber: Masti.Jun 28 2018, 8:28 PM
Xqt claimed this task.Jun 29 2018, 6:06 AM
Xqt added a project: good first task.

Change 443027 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Enable namespace filtering for unconnected_pages GeneratorFactory

https://gerrit.wikimedia.org/r/443027

Xqt added a comment.Jun 29 2018, 7:09 AM

As zhuyifei1999 statet the problems is the pagegenerators filtering. It assumes that QueryGenerators always have a namespace parameter. Either the api should do the filtering or pagegenerators should know whether the QueryGenerators is able to do it.

Change 443027 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Enable namespace filtering for unconnected_pages GeneratorFactory

https://gerrit.wikimedia.org/r/443027

Change 442910 abandoned by Xqt:
T196619 Revert "[cleanup] Deprecate pagegenerators.UnconnectedPageGenerator"

Reason:
due to https://gerrit.wikimedia.org/r/#/c/pywikibot/core/ /443027/

https://gerrit.wikimedia.org/r/442910

Xqt closed this task as Resolved.Jun 29 2018, 9:04 AM

Thanks for fixing this guys. As an afterburner I found in T173293 that the API does filter by namespace, but isn't really supposed to do that. See example ...........

Vvjjkkii renamed this task from newitem.py ignores -namespace: parameter to ohbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Xqt as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from ohbaaaaaaa to newitem.py ignores -namespace: parameter.Jul 2 2018, 3:03 PM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to Xqt.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.