Page MenuHomePhabricator

Add a generator based on an online list of pages
Closed, ResolvedPublicFeature

Description

Many a time, I use a query (often using Quarry) to identify a list of pages that need to be processed by Pywikibot. I have to then save that list to a page on wiki and then point the bot to that page using the -links: parameter. This extra step of saving the list to wiki is wasteful. I suggest adding another generator, perhaps called -url:, which allows passing a URL that would return a CSV with column consisting of page titles. For instance, I could use -url:https://quarry.wmflabs.org/run/123456/output/0/csv to get the list directly from Quarry. Any webservice can also be used in the same way.

@Xqt do you have thoughts on this or recommendations about what the command line argument should be called?

Details

Related Changes in Gerrit:

Event Timeline

Quarry result does not contain links as such, only text urls, which could be hard to crawl by Pywikibot. For me Quarry works good also with -file:.

Yes, essentially we want a counterpart to -file: which, instead of needing a file on the disk, can work with a file online given its URL.

I see. This could be easy as most part of the code is already written for -file: :D

(in Toolforge you can use MySQLPageGenerator to ask Quarry directly)

Change 698655 had a related patch set uploaded (by Chris Maynor; author: Chris Maynor):

[pywikibot/core@master] pagegenerators: Add -url option

https://gerrit.wikimedia.org/r/698655

Xqt triaged this task as Low priority.Jun 8 2021, 4:48 AM
Xqt changed the subtype of this task from "Task" to "Feature Request".

Change 698655 merged by jenkins-bot:

[pywikibot/core@master] pagegenerators: Add -url option

https://gerrit.wikimedia.org/r/698655