Timeout when updating complex pages
OpenPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1399/
Reported by: malafaya
Created on: 2012-01-17 00:22:50
Subject: Updating complex pages
Original description:
When updating complex pages, it's common to get a Timeout, because the Wikimedia server does not process and return the page within the expected time. In suchs cases \(when a timeout exception is thrown\), my suggestion si that pywikipedia should try to fetch the page again and check if there are any differences against the new page to be saved. If not, then it should proceed and not block indefinitely in such pages.


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1399

bzimport added a project: Pywikibot-network.Via ConduitNov 22 2014, 2:27 AM
bzimport added a subscriber: Unknown Object (????).
bzimport set Reference to bz55219.
Legoktm created this task.Via LegacyOct 5 2013, 4:45 AM
Legoktm added a comment.Via ConduitOct 5 2013, 4:45 AM

This is the way the bot works. It trys to put the page for several times which is given by maxretries in the \(user\_\)config.py. Edit conflicts are detected \(by the mw api\) except you are using your bot account for multiple edits on the same page in the same time.

Legoktm added a comment.Via ConduitOct 5 2013, 4:45 AM
  • status: open --> pending
Legoktm added a comment.Via ConduitOct 5 2013, 4:45 AM

Hmmm, I'm not sure you understood. I'm not updating the page more than once simultaneoulsy. It's just one bot run. As the page is a complicated one, the server does not respond on time \(you can try \[\[Europa\]\] at pt.wiktionary\). The bot then tries again, but obviously the same happens. The difference is that the page has already been updated in the first try, even if the server has not responded. In operations such as replace.py, where it's common to edit long pages, you get in a long loop.

Legoktm added a comment.Via ConduitOct 5 2013, 4:45 AM
  • status: pending --> open
Legoktm added a comment.Via ConduitOct 5 2013, 4:45 AM

I'm talking about this error:

Updating page \[\[Sri Lanka\]\] via API
HTTPError: 504 Gateway Time-out

The page to be updated is quite big so the server does not reply on time.
1\) Is there a way to increase the timeout? I believe this is controlled by the server, not the HTTP client...
2\) The page was updated on the first try but as the page is not refreshed between retries, the bot doesn't know and will try to update it "forever"

Xqt added a comment.Via ConduitNov 11 2013, 6:23 AM
  • Bug 56884 has been marked as a duplicate of this bug. ***
Fae added a comment.Via ConduitNov 14 2013, 10:51 AM

Checking this morning with Faebot, 1.6% of get/put transactions have failed out of a sample of more than 1,000. These were small size category changes rather than file uploads or large page edits. I believe most failures have been on putting pages rather than getting them, but I have seen getting pages causing this failure.

As everyone appears affected, not just API users, I have asked for feedback at the Village pump (http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&diff=prev&oldid=109634734).

I am not convinced that this is a pywikipediabot specific problem, it does not relate to any changes in pywikipediabot which has never before had this problem with this frequency, so the bug report (1399) above may well be a dead end.

zhuyifei1999 added a comment.Via ConduitNov 14 2013, 10:53 AM

503 is also happening:

Sleeping for 7.9 seconds, 2013-11-13 11:20:55

Updating page \[\[File:Русский энциклопедический словарь Березина 4.2 077.jpg\]\] via API

Result: 503 Service Unavailable

Traceback (most recent call last):
(hidden)

File "(hidden)/pywikipedia/wikipedia.py", line 2242, in put
  sysop=sysop, botflag=botflag, maxTries=maxTries)
File "(hidden)/pywikipedia/wikipedia.py", line 2339, in _putPage
  back_response=True)
File "(hidden)/pywikipedia/pywikibot/support.py", line 121, in wrapper
  return method(*__args, **__kw)
File "(hidden)/pywikipedia/query.py", line 138, in GetData
  site.cookies(sysop=sysop))
File "(hidden)/pywikipedia/wikipedia.py", line 6977, in postForm
  cookies=cookies)
File "(hidden)/pywikipedia/wikipedia.py", line 7021, in postData
  f = MyURLopener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
  response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
  'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
  return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
  result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
  raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

urllib2.HTTPError: HTTP Error 503: Service Unavailable

bzimport added a comment.Via ConduitNov 14 2013, 11:09 AM

humayunmirza88 wrote:

The problem is still ongoing

bzimport added a comment.Via ConduitNov 18 2013, 7:56 PM

warnckew wrote:

I'd like to second this. When saving large complex pages, I frequently get 503 responses. As Daniel Schwen notes in bug 56884, it would be great to be able to tell Pywikibot to _not_ retry and instead manually check if the edit went through.

bzimport added a comment.Via ConduitNov 18 2013, 8:11 PM

warnckew wrote:

I patched my local copy of Pywikibot core, adding a max_retries parameter to editpage() to only allow it to attempt an edit once. No changes to other files appear necessary since Page.save() passes on any additional parameters. Should I propose that as a patch? If so, what format is preferred?

valhallasw added a comment.Via ConduitNov 18 2013, 8:15 PM

If you could upload it to gerrit (either via git directly, or via the patch uploader at https://tools.wmflabs.org/gerrit-patch-uploader/ ), that would be really nice.

I'm a bit confused however, as data.api.Request seems to get max_retries from the config file. Does it get passed another value of max_retries somewhere? I can't find where that would be...

bzimport added a comment.Via ConduitNov 18 2013, 8:45 PM

warnckew wrote:

data.api.Request does kwargs.pop(), so if it gets instantiated with a max_retries parameter it will use that value, otherwise it reads the config parameter.

In my case I found that I can just set pywikibot.config.max_retries instead of passing it as a parameter to Page.save(). Arguably nicer than passing a parameter around, which requires some way of handling a default value. Sorry about not figuring that out earlier.

valhallasw added a comment.Via ConduitNov 18 2013, 8:49 PM

I'm still a bit confused by Daniel's comment:

Now pywikipediabot tries again by itself an apparently infinite amount of times
Despite having set max_retries to 2 in my user-config.py

but this does seem to work for me (at least: setting max_retries in user-config.py sets pywikibot.config.max_retries). Strange.

bzimport added a comment.Via ConduitNov 18 2013, 9:15 PM

daniel wrote:

Ahhrgh! I changed the max_retries setting in ./user-config.py but core reads ~/.pywikibot/user-config.py

Sorry. Will try again with the new setting.

Bawolff added a comment.Via ConduitDec 11 2013, 2:56 AM

On the wikimedia side see also bug 57026. (Not a dupe since Pywikipedia should also handle these situations gracefully.)

Ricordisamoa added a comment.Via ConduitApr 16 2014, 3:01 AM
  • Bug 55162 has been marked as a duplicate of this bug. ***
Ladsgroup added a comment.Via ConduitJul 24 2014, 12:36 PM
  • Bug 55179 has been marked as a duplicate of this bug. ***
Sipun added a subscriber: Sipun.Via WebWed, Jun 24, 7:52 AM

Add Comment