Page MenuHomePhabricator

Add a generator based on an online list of pages
Open, Needs TriagePublic

Description

Many a time, I use a query (often using Quarry) to identify a list of pages that need to be processed by Pywikibot. I have to then save that list to a page on wiki and then point the bot to that page using the -links: parameter. This extra step of saving the list to wiki is wasteful. I suggest adding another generator, perhaps called -url:, which allows passing a URL that would return a CSV with column consisting of page titles. For instance, I could use -url:https://quarry.wmflabs.org/run/123456/output/0/csv to get the list directly from Quarry. Any webservice can also be used in the same way.

@Xqt do you have thoughts on this or recommendations about what the command line argument should be called?

Event Timeline

Huji created this task.Nov 28 2019, 6:34 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptNov 28 2019, 6:34 PM
Dvorapa added a subscriber: Dvorapa.EditedNov 28 2019, 7:02 PM

Quarry result does not contain links as such, only text urls, which could be hard to crawl by Pywikibot. For me Quarry works good also with -file:.

Huji added a comment.Nov 28 2019, 7:45 PM

Yes, essentially we want a counterpart to -file: which, instead of needing a file on the disk, can work with a file online given its URL.

Dvorapa added a comment.EditedNov 28 2019, 7:55 PM

I see. This could be easy as most part of the code is already written for -file: :D

(in Toolforge you can use MySQLPageGenerator to ask Quarry directly)