API requests to commons frequently return 502
Closed, ResolvedPublic

Description

Author: daniel

Description:
Starting today I've been seeing this in my bot logs

HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...

This morning there were so many werrors that one of my bots (commons:user:QICbot) hit the retry limit and died mid-run. It is currently rerunning, but with errors on every 5th request approximately.


Version: unspecified
Severity: normal

Details

Reference
bz30201
bzimport set Reference to bz30201.
bzimport created this task.Aug 3 2011, 4:12 PM

daniel wrote:

less frequently I do get:

HTTPError: 504 Gateway Time-out

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...

Also it seems the errors only hit me when doing a page.put() not during a page.get()

Rillke added a comment.Aug 3 2011, 4:49 PM

Using the API by JavaScript (JSON), I randomly get Server error 504 and less frequently 502.

502 on action=edit
504 on both action=edit, action=query&prop=imageinfo|info|revisions|categories

Reedy added a comment.Aug 3 2011, 9:09 PM

I wonder if this has any relation (or vice versa) to the slow uploads that have been noticed...

saibotrash wrote:

A user reported "Tried with commonist tool and I got the following message: could not upload (requirement failed: unexpected response: HTTP/1.0 502 Bad Gateway)." at http://commons.wikimedia.org/wiki/Commons:Prototype_upload_wizard_feedback#Error_unknown

And, of course, the upload wizard also has strange upload errors (see the link).

And he is not the only one...

http://commons.wikimedia.org/wiki/Commons:Forum#unexpected_response:_HTTP.2F1.0_502_Bad_Gateway

Next time I try to find out whether there are some details in the HTML - error-output.

Rillke added a comment.Aug 4 2011, 1:48 PM

For the 504-error:

Request: POST http://commons.wikimedia.org/w/api.php, from 91.198.174.40 via sq34.wikimedia.org (squid/2.7.STABLE9) to ()
Error: ERR_CANNOT_FORWARD, errno [No Error] at Thu, 04 Aug 2011 13:44:18 GMT

POST:
action=query&prop=imageinfo%7Cinfo%7Crevisions%7Ccategories&rvprop=timestamp%7Ccontent&intoken=&iiprop=url%7Csize&iiurlwidth=120&iiurlheight=120&titles=File%3AWikipedia.tamil.path.svg&clprop=hidden&cllimit=25&format=json

Response-Header
Server squid/2.7.STABLE9
Date Thu, 04 Aug 2011 13:44:18 GMT
Content-Type text/html
Content-Length 3003
X-Squid-Error ERR_CANNOT_FORWARD 0
X-Cache MISS from sq34.wikimedia.org, MISS from knsq30.knams.wikimedia.org, MISS from amssq37.esams.wikimedia.org
X-Cache-Lookup MISS from sq34.wikimedia.org:3128, MISS from knsq30.knams.wikimedia.org:3128, MISS from amssq37.esams.wikimedia.org:80
Connection close

Request-Header
Host commons.wikimedia.org
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0
Accept application/json, text/javascript, */*
Accept-Language de
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With XMLHttpRequest
Referer http://commons.wikimedia.org/wiki/Commons_talk:Tools/Commonist
Content-Length 220
Cookie centralauth_User=Rillke; centralauth_Session=xxx; commonswiki_session=xxx; dismissSiteNotice=2.38; popTz=0; vector-nav-p-tb=true

Rillke added a comment.Aug 4 2011, 3:09 PM

For the 502-error:

Request: POST http://commons.wikimedia.org/w/api.php, from 77.184.171.69 via amssq32.esams.wikimedia.org (squid/2.7.STABLE9) to 91.198.174.40 (91.198.174.40)
Error: ERR_READ_ERROR, errno (104) Connection reset by peer at Thu, 04 Aug 2011 15:06:59 GMT

action=query&list=logevents&leprop=title%7Ctype%7Ctimestamp&letype=upload&leuser=Rillke&lelimit=50&lestart=2011-06-03T15%3A02%3A31Z&format=json

Respons-Header
Server squid/2.7.STABLE9
Date Thu, 04 Aug 2011 15:06:59 GMT
Content-Type text/html
Content-Length 3054
X-Squid-Error ERR_READ_ERROR 104
X-Cache MISS from amssq32.esams.wikimedia.org
X-Cache-Lookup MISS from amssq32.esams.wikimedia.org:80
Connection close

Request-Header
Host commons.wikimedia.org
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0
Accept application/json, text/javascript, */*
Accept-Language de
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With XMLHttpRequest
Referer http://commons.wikimedia.org/wiki/User:Rillke/AjaxMassDelete.js
Content-Length 143
Cookie centralauth_User=Rillke; centralauth_Session=xxx; commonswiki_session=xxx; dismissSiteNotice=2.38; popTz=0; vector-nav-p-tb=true

Reedy added a comment.Aug 4 2011, 3:29 PM

It would seem that all of you are Europe based.

Do we know if anyone who is elsewhere (ie would be hitting PMTPA) is experiencing the issue?

neilk wrote:

Reedy: I've tried to replicate this here several times and I can't. Hard to prove a negative though.

Reedy added a comment.Aug 4 2011, 10:28 PM

(In reply to comment #9)

Reedy: I've tried to replicate this here several times and I can't. Hard to
prove a negative though.

Indeed, watching the squid logs there seems to be some here and there.

Some of the mentioned machines (in Tampa), have been recently upgraded.

I'm not sure if it's co-incidental that these upgrades have happened and errors have started, it could quite well likely be so

I have logged an RT ticket and CC'd Peter -http://rt.wikimedia.org/Ticket/Display.html?id=1263

lowering priority since this has moved to Ops.

Reedy added a comment.Aug 5 2011, 12:06 PM

Ops have made a couple of changes in the last few hours.

Can anyone who has been able to reproduce this, test again and find out if it's still happening?

Thanks!

ralf wrote:

Yes is still happening!!

ralf wrote:

one Foto works but more not.

daniel wrote:

Yep I can confirm that my bots are still getting plenty of 502 on page.put() in pywikipediabot. Doesnt' look like anything has changed yet.

Rillke added a comment.Aug 5 2011, 3:05 PM

Yup, using the API with JavaScript throws error 502 and 504 on every 5th request.

Reedy added a comment.Aug 5 2011, 3:57 PM

How about now (unfortunately we're having issues reproducing it, this is the simplest way)? It seems one of the api apache application servers was very out of sync, and is in the progress of being fixed... But won't be hit atm

daniel wrote:

Does none of the admins have toolserver account?! Or any account on a european computer?
Looks better now. I'll keep testing.

Reedy added a comment.Aug 5 2011, 4:09 PM

(In reply to comment #18)

Does none of the admins have toolserver account?! Or any account on a european
computer?
Looks better now. I'll keep testing.

We have both. I tried numerous requests yesterday with AWB, and encountered no errors.

daniel wrote:

Nope, sorry. 502 is back. The test before must have been a fluke (was a short bot run that went thorough without errors)

daniel wrote:

The frequency of 502 errors is down to about 1 in 10 requests failing. Looks
like something is improving after all.
Can anyone else confirm this?

No, still no improvement for mee.

neilk wrote:

Side note, I have changed the following messages or pages on Commons to point to Commons:Upload until the issues are resolved.

MediaWiki:Sitenotice
Commons:Upload - commented out {{UploadWizard}} template
MediaWiki:Upload-url
MediaWiki:Upload-url/en

afeldman wrote:

I was able to track down a few examples in the kennisnet udplog stream.

knsq23.knams.wikimedia.org 3654410 2011-08-05T21:00:08.774 6676 213.221.6.148 TCP_MISS/502 3330 POST http://uk.wikipedia.org/w/api.php CARP/91.198.174.40 text/html - - PythonWikipediaBot/1.0

knsq23.knams.wikimedia.org 3272304 2011-08-05T20:43:08.138 4681 217.187.129.179 TCP_MISS/502 3432 POST http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Xqt&action=submit CARP/91.198.174.40 text/html http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Xqt&action=edit&section=56 - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64;%20rv:5.0)%20Gecko/20100101%20Firefox/5.0

All POST's, all getting hashed via carp to the backend squid on knsq30. A drive is failing on knsq30 (91.198.174.40) and there are possibly other problems - load is many times hire than all other squids. I removed it from the frontend.conf and since deploying, have not seen any more 502's.

Now I don't get the 502 anymore, but instead timeouts:

Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 8 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 16 minutes...

neilk wrote:

Made Special:UploadWizard default uploader again on Commons (see comment #23)(In reply to comment #23)

Side note, I have changed the following messages or pages on Commons to point
to Commons:Upload until the issues are resolved.

MediaWiki:Sitenotice
Commons:Upload - commented out {{UploadWizard}} template
MediaWiki:Upload-url
MediaWiki:Upload-url/en

Switched these back to use UploadWizard.

Add Comment