Page MenuHomePhabricator

Allow any number of filters in addition to page generators
Open, LowPublicFeature

Description

The current generator system is very limited when we need to use several conditions. You can get pages by category, linked from a given page, links to a given page, transcluded, etc, but it's difficult or impossible to combine such generators to filter a specific generator, and it's done in an inefficient way: It usually takes both generators, and once you get the list of pages of both, it processes the pages that are in both returned results.

Example use case:

  • Get a list of images from a given category that are not in use (for example, to delete them)

The current approach would be to use the category generator and also the unusedfiles generator, but the later can return a very long list, and even incomplete, on large wikis (because it has a limit), and that's not efficient.

There should be a way to specify the generator, and also a way to specify a filtering generator.

For example, in the situation from the example, I should be able to specify the categorymember generator (optionally with the namespace to get only pages in the file namespace) to get a list of images on that category, and then, when processing each page, check the usage of that particular image to see if it's in use (optionally, filtering in which namespaces to filter, that would be a plus!).

Most of the current generators may be allowed to work as either generators or filters. In an ideal world, pywikibot could be smart enough to select the better combination itself. In the previous example, select category as generator. and then filter by usage. But that's probably very complicated to achieve, so I suggest using a prefix on current generators to tell pywikibot to use them as filters and not generators. That way it also gives users control to specify the main generator.

For example: -cat:"Some category" -ns:File -filter-unusedfiles

See also:

Event Timeline

Just for reference, if anyone needs this, I've solved this particularly (and hacky) for delete.py by adding this to treat_page:

"""Process one page from the generator."""
# Skip images that are in use
if len(list(self.site.imageusage(self.current_page))) > 0:
    pywikibot.output(u'Skipping: {0} is in use.'.format(
            self.current_page))
    return

Note that -catfilter uses a different naming convention ...

Xqt triaged this task as Low priority.Aug 26 2016, 7:40 AM
Xqt changed the subtype of this task from "Task" to "Feature Request".Jun 28 2023, 2:15 PM