Page MenuHomePhabricator

Pywikibot; Listpages.py should have options to limit results to a specfied range of pages, titles (or title prefixes)..
Open, LowPublicFeature

Description

Feature summary (what you would like to be able to do and where):

Limit the titles returned by listpages.py to a specific range of pages, (ideally in the generation section of the script as opposed to in the filtering option.)

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Using PAWS- I ran the following command

pwb.py listpages -usercontribs:"ShakespeareFan00"  -grep:"\<\!\-\-" -lang:en -family:wikisource   -format:" * [[{page.loc_title}]]" > comments.txt

This query runs, but takes a long time to complete ( a few hours), given the large number of contributions I have on the wiki concerned. Performing a -grep on at least 25,000 pages (most of which will not form part of the desired result) is also time consuming and inefficient. Browsing a large results set is also an inefficient workflow.

It would be desirable to make a query like the one listed more granular, so that it can be batched into groups based on a specfic title prefix or character sequence. ( Such as all contributions starting with a given letter)

There is also no option ( a missing generator maybe) in listpages to generate a range of pages.
The -start option only indicates that generation of titles should start at a given title, there is no corresponding -end or -stop option to indicate where generation of titles should cease.

Benefits (why should this be implemented?):
Increased granularity of queries tends to reduce their size and thus makes them more effective.

Would allow for queries to be written where a user knows that they want to find a page within a given range (such as for example a series of titles where there is a distinct pattern of known titles, such as a set of subpages, or an Index'ed work on Wikisource.

Suggested approaches

  • Extend the -start options syntax so titles cease to be generated from the relevant generator when the specified additional title or prefix is encountered.
  • Explicitly specified -end or -stop option , so that generated results cease after the specified title has been seen (taking into account any -intersect(s) from other generators.. (This may be tricky to implement given that the results returned from some generators are not necessarily sorted/generated in a 'title' order.)
  • The -grep option currently is for the contents of the page, a 'title/prefix' grep may be desirable to filter results to a specific set of pages, such as pages in an Index: ed work on Wikisource.

Event Timeline

ShakespeareFan00 renamed this task from Pywikibot; Listpages.py should have a -range generator option to limit results to a specfied range of pages, titles (or title prefixes) to Pywikibot; Listpages.py should have options to limit results to a specfied range of pages, titles (or title prefixes)...May 11 2022, 9:26 AM
ShakespeareFan00 added a project: Pywikibot.

This query runs, but takes a long time to complete ( a few hours), given the large number of contributions I have on the wiki concerned. Performing a -grep on at least 25,000 pages (most of which will not form part of the desired result) is also time consuming and inefficient. Browsing a large results set is also an inefficient workflow.

Please note: You only can speed up generators if MediaWiki API provides its own filters. This is the case for API:usercontribs; it also provides start/end timestamp and a tag filter which inly list revisions tagged with this tag; currently they cannot be used with command line options. Other pagegenerators filter like -grep works with the peloaded pages and needs a lot of time to filer pages out. But there is not title filter. Vice versa API:Allpages does support title filtering but does not have any user contrib filter (and btw. we have no such pagegenerators filter).

By the way there is a -titleregex option which can be used together with -grep:
https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html?highlight=filter%20options#filter-options

Xqt triaged this task as Low priority.May 12 2022, 5:14 AM
Xqt updated the task description. (Show Details)