Page MenuHomePhabricator

remove superfluous sleep between archiving each page
Closed, ResolvedPublic

Description

The sleep in

try:
    archiver = PageArchiver(pg, a, salt, force)
    archiver.run()
    time.sleep(10)
except Exception:
    pywikibot.error(u'Error occurred while processing page %s' % pg)
    pywikibot.exception(tb=True)

appears to be of little use because

  1. There is no need to wait for user action, because the script is normally run periodically, not interactively.
  2. There is no need to limit API requests, because requests are sequentially emitted in this script.

This was suggested in https://gerrit.wikimedia.org/r/#/c/215871/.

Event Timeline

whym created this task.Jun 27 2015, 3:02 AM
whym raised the priority of this task from to Needs Triage.
whym updated the task description. (Show Details)
whym added a project: Pywikibot-archivebot.py.
whym added subscribers: whym, jayvdb.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJun 27 2015, 3:02 AM
whym updated the task description. (Show Details)Jun 27 2015, 3:07 AM
whym set Security to None.
whym updated the task description. (Show Details)
jayvdb added a subscriber: Xqt.Jun 27 2015, 8:25 AM

Quickly looking at the other scripts, I see

  • quite a few sleeps are because of a server-error-retry cycle. some of these are also dubious in core, as the http and api layer have already retried the request; but e.g. the flickr api code does sleep-retry, which is appropriate because it doesnt use the api layer.
  • weblinkchecker.py does some very small sleeps while spinning down the threads
  • checkimages.py has a sleep argument -time: to slow down the bot
  • replace.py has a sleep argument -sleep: to slow down the bot between regex on each page? Quite odd : 9d415671b3
  • clean_sandbox.py and welcome.py has a sleep between bot runs - i.e. a delay before restarting the process again, which is effectively waiting for humans to do stuff
jayvdb renamed this task from remove superfluous sleep made during moving from one page to another to remove superfluous sleep between archiving each page.Dec 2 2015, 11:53 AM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 2 2015, 11:53 AM
murfel claimed this task.Dec 8 2015, 11:41 AM
murfel added a subscriber: murfel.

For now, sleep time can be specified through a CLI option. If this option is specified, is it better to ignore it or to output a deprecation notice? Or is it better to leave the CLI option as is but change its default value from 10 to 0?

jayvdb added a comment.Dec 8 2015, 5:54 PM

The archivebot doesnt currently have a sleep command line option, and I dont believe this script needs a custom command line option for sleeping. The sleep related code in archivebot can be deleted.
There is a global option -put_throttle which the user can can slow down the bot if they want to.

whym added a comment.Dec 8 2015, 11:51 PM

I added the -sleep option to archivebot in https://gerrit.wikimedia.org/r/#/c/222727/ in July - whether or not it should go is up to discussion. (Sorry for not linking the change here.)

murfel removed murfel as the assignee of this task.Dec 9 2015, 5:19 AM

jayvdb, I think there is.

for v in if_arg_value(arg, '-sleep'):
    sleep = int(v)