The current generator system is very limited when we need to use several conditions. You can get pages by category, linked from a given page, links to a given page, transcluded, etc, but it's difficult or impossible to combine such generators to filter a specific generator, and it's done in an inefficient way: It usually takes both generators, and once you get the list of pages of both, it processes the pages that are in both returned results.
Example use case:
- Get a list of images from a given category that are not in use (for example, to delete them)
The current approach would be to use the category generator and also the unusedfiles generator, but the later can return a very long list, and even incomplete, on large wikis (because it has a limit), and that's not efficient.
There should be a way to specify the generator, and also a way to specify a filtering generator.
For example, in the situation from the example, I should be able to specify the categorymember generator (optionally with the namespace to get only pages in the file namespace) to get a list of images on that category, and then, when processing each page, check the usage of that particular image to see if it's in use (optionally, filtering in which namespaces to filter, that would be a plus!).
Most of the current generators may be allowed to work as either generators or filters. In an ideal world, pywikibot could be smart enough to select the better combination itself. In the previous example, select category as generator. and then filter by usage. But that's probably very complicated to achieve, so I suggest using a prefix on current generators to tell pywikibot to use them as filters and not generators. That way it also gives users control to specify the main generator.
For example: -cat:"Some category" -ns:File -filter-unusedfiles
See also: