Page MenuHomePhabricator

Add a generator based on an online list of pages
Closed, ResolvedPublicFeature

Description

Many a time, I use a query (often using Quarry) to identify a list of pages that need to be processed by Pywikibot. I have to then save that list to a page on wiki and then point the bot to that page using the -links: parameter. This extra step of saving the list to wiki is wasteful. I suggest adding another generator, perhaps called -url:, which allows passing a URL that would return a CSV with column consisting of page titles. For instance, I could use -url:https://quarry.wmflabs.org/run/123456/output/0/csv to get the list directly from Quarry. Any webservice can also be used in the same way.

@Xqt do you have thoughts on this or recommendations about what the command line argument should be called?

Event Timeline

Quarry result does not contain links as such, only text urls, which could be hard to crawl by Pywikibot. For me Quarry works good also with -file:.

Yes, essentially we want a counterpart to -file: which, instead of needing a file on the disk, can work with a file online given its URL.

I see. This could be easy as most part of the code is already written for -file: :D

(in Toolforge you can use MySQLPageGenerator to ask Quarry directly)

Change 698655 had a related patch set uploaded (by Chris Maynor; author: Chris Maynor):

[pywikibot/core@master] pagegenerators: Add -url option

https://gerrit.wikimedia.org/r/698655

Xqt triaged this task as Low priority.Jun 8 2021, 4:48 AM
Xqt changed the subtype of this task from "Task" to "Feature Request".

Change 698655 merged by jenkins-bot:

[pywikibot/core@master] pagegenerators: Add -url option

https://gerrit.wikimedia.org/r/698655