Page MenuHomePhabricator

Restartable bot framework
Open, LowPublic

Description

The pywikibot codebase is almost ready for a bot pause/resume project.

Many of the scripts now have a Bot class which inherits from pywikibot.bot.Bot , with a run method which is an inherited method, or behaves in a consistent manner. Thus it is easy to place the pause/resume functionality into the Bot class and implemented & test it for multiple scripts.

Mentors: @DrTrigon

Event Timeline

As I see we need at least to have throttle and http functions able to be paused, that should cover already most we need, right?

I think the simplest option is to stop after a given page has been processed, and continuing there the next time.

I agree with @valhallasw that this should be page based resumption.

The simplest solution (MVP) would, on pause, continue to run the existing generator, and write out a page title list to a file instead of processing the pages.
Then the resumption would use the page title list in the stored file using -file:xxx.

A better implementation would, on pause, store the next page title to be processed, and all of the generator arguments.
Then the resumption would create a new generator which resumes where the last generator stopped.

We need both approaches (e.g. the former is better if the input generator was -file:xxx, and the second approach will not be possible for some API generators that do not support a 'start from' title argument.

And there are some generators that are not pause/resume-able, like -random, in which case the user should simply re-run their original command to continue.

Resumption at the http/network level depends on the MediaWiki api maintaining/respecting old continuation data. This is not unreasonable in many cases, as the API continuation data is often page titles, etc.
However, while we may be able to pause/resume the http API process, the user may have hit pause at the first record of 5000 records in the last http API resultset, so that implementation will still need to handle the case of injecting cached data into the API layer before switching to resuming fetching from the http layer.

The simplest solution (MVP) would, on pause, continue to run the
existing generator, and write out a page title list to a file instead
of processing the pages.
Then the resumption would use the page title list in the stored file
using -file:xxx.

A better implementation would, on pause, store the next page title to
be processed, and all of the generator arguments.
Then the resumption would create a new generator which resumes where
the last generator stopped.

I agree and have to note that something similar is in catimages too... ;) From there I now this simple approaches can go terribly wrong if the page to resume with has been deleted meanwhile... Other than that it doable and will cover the majority of the cases.
Dr. Trigon

I agree with @valhallasw that this should be page based resumption.

The simplest solution (MVP) would, on pause, continue to run the existing generator, and write out a page title list to a file instead of processing the pages.
Then the resumption would use the page title list in the stored file using -file:xxx.

Both approaches are needed as the one above does not apply when you want to pause because you need to logoff.

Why does it not apply in that case?

I f I need to switch off my computer, how can I let it "write out a page title list to a file instead of processing the pages"?

Maybe @Mpaa meant forced shutdown, or signal 9, in which case letting the
generator run in collect-only mode isnt possible.

catimages just stored the name of the last file processed and then continued from there on restart.

Instead of outputting to text file the skipped ones, output the processed ones. In case the last one got deleted and the generator returns another list we have plenty of fallback possiblities... 2nd last, the one before that, etc.

@jayvdb and @DrTrigon, would you like to feature this project for GSoC/Outreachy May-Aug 2017? If yes, please add tag Outreach-Programs-Projects

@jayvdb; if you are intressted, then I am willing to help of course... ;)

What about @AbdealiJK?

I havent been actively developing Pywikibot, and preparing to mentor this particular project (which would fiddle deep in the belly of pywikibot) would require getting up to speed on a lot of changes since I very active.

srishakatux subscribed.

Removing the Possible-Tech-Projects tag as we are planning to kill it soon! This project does not seem to fit in the Outreach-Programs-Projects category in its current state, so I am not adding that tag right now!

Xqt triaged this task as Low priority.Sep 29 2018, 5:36 AM