API requests to commons frequently return 502
Closed, ResolvedPublic

Description

Author: daniel

Description:
Starting today I've been seeing this in my bot logs

HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...

This morning there were so many werrors that one of my bots (commons:user:QICbot) hit the retry limit and died mid-run. It is currently rerunning, but with errors on every 5th request approximately.


Version: unspecified
Severity: normal

bzimport set Reference to bz30201.
bzimport created this task.Via LegacyAug 3 2011, 4:12 PM
bzimport added a comment.Via ConduitAug 3 2011, 4:20 PM

daniel wrote:

less frequently I do get:

HTTPError: 504 Gateway Time-out

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...

Also it seems the errors only hit me when doing a page.put() not during a page.get()

Rillke added a comment.Via ConduitAug 3 2011, 4:49 PM

Using the API by JavaScript (JSON), I randomly get Server error 504 and less frequently 502.

502 on action=edit
504 on both action=edit, action=query&prop=imageinfo|info|revisions|categories

Reedy added a comment.Via ConduitAug 3 2011, 9:09 PM

I wonder if this has any relation (or vice versa) to the slow uploads that have been noticed...

bzimport added a comment.Via ConduitAug 4 2011, 12:09 AM

saibotrash wrote:

A user reported "Tried with commonist tool and I got the following message: could not upload (requirement failed: unexpected response: HTTP/1.0 502 Bad Gateway)." at http://commons.wikimedia.org/wiki/Commons:Prototype_upload_wizard_feedback#Error_unknown

And, of course, the upload wizard also has strange upload errors (see the link).

Rillke added a comment.Via ConduitAug 4 2011, 10:08 AM

And he is not the only one...

http://commons.wikimedia.org/wiki/Commons:Forum#unexpected_response:_HTTP.2F1.0_502_Bad_Gateway

Next time I try to find out whether there are some details in the HTML - error-output.

Rillke added a comment.Via ConduitAug 4 2011, 1:48 PM

For the 504-error:

Request: POST http://commons.wikimedia.org/w/api.php, from 91.198.174.40 via sq34.wikimedia.org (squid/2.7.STABLE9) to ()
Error: ERR_CANNOT_FORWARD, errno [No Error] at Thu, 04 Aug 2011 13:44:18 GMT

POST:
action=query&prop=imageinfo%7Cinfo%7Crevisions%7Ccategories&rvprop=timestamp%7Ccontent&intoken=&iiprop=url%7Csize&iiurlwidth=120&iiurlheight=120&titles=File%3AWikipedia.tamil.path.svg&clprop=hidden&cllimit=25&format=json

Response-Header
Server squid/2.7.STABLE9
Date Thu, 04 Aug 2011 13:44:18 GMT
Content-Type text/html
Content-Length 3003
X-Squid-Error ERR_CANNOT_FORWARD 0
X-Cache MISS from sq34.wikimedia.org, MISS from knsq30.knams.wikimedia.org, MISS from amssq37.esams.wikimedia.org
X-Cache-Lookup MISS from sq34.wikimedia.org:3128, MISS from knsq30.knams.wikimedia.org:3128, MISS from amssq37.esams.wikimedia.org:80
Connection close

Request-Header
Host commons.wikimedia.org
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0
Accept application/json, text/javascript, */*
Accept-Language de
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With XMLHttpRequest
Referer http://commons.wikimedia.org/wiki/Commons_talk:Tools/Commonist
Content-Length 220
Cookie centralauth_User=Rillke; centralauth_Session=xxx; commonswiki_session=xxx; dismissSiteNotice=2.38; popTz=0; vector-nav-p-tb=true

Rillke added a comment.Via ConduitAug 4 2011, 3:09 PM

For the 502-error:

Request: POST http://commons.wikimedia.org/w/api.php, from 77.184.171.69 via amssq32.esams.wikimedia.org (squid/2.7.STABLE9) to 91.198.174.40 (91.198.174.40)
Error: ERR_READ_ERROR, errno (104) Connection reset by peer at Thu, 04 Aug 2011 15:06:59 GMT

action=query&list=logevents&leprop=title%7Ctype%7Ctimestamp&letype=upload&leuser=Rillke&lelimit=50&lestart=2011-06-03T15%3A02%3A31Z&format=json

Respons-Header
Server squid/2.7.STABLE9
Date Thu, 04 Aug 2011 15:06:59 GMT
Content-Type text/html
Content-Length 3054
X-Squid-Error ERR_READ_ERROR 104
X-Cache MISS from amssq32.esams.wikimedia.org
X-Cache-Lookup MISS from amssq32.esams.wikimedia.org:80
Connection close

Request-Header
Host commons.wikimedia.org
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0
Accept application/json, text/javascript, */*
Accept-Language de
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With XMLHttpRequest
Referer http://commons.wikimedia.org/wiki/User:Rillke/AjaxMassDelete.js
Content-Length 143
Cookie centralauth_User=Rillke; centralauth_Session=xxx; commonswiki_session=xxx; dismissSiteNotice=2.38; popTz=0; vector-nav-p-tb=true

Reedy added a comment.Via ConduitAug 4 2011, 3:29 PM

It would seem that all of you are Europe based.

Do we know if anyone who is elsewhere (ie would be hitting PMTPA) is experiencing the issue?

bzimport added a comment.Via ConduitAug 4 2011, 10:24 PM

neilk wrote:

Reedy: I've tried to replicate this here several times and I can't. Hard to prove a negative though.

Reedy added a comment.Via ConduitAug 4 2011, 10:28 PM

(In reply to comment #9)

Reedy: I've tried to replicate this here several times and I can't. Hard to
prove a negative though.

Indeed, watching the squid logs there seems to be some here and there.

Some of the mentioned machines (in Tampa), have been recently upgraded.

I'm not sure if it's co-incidental that these upgrades have happened and errors have started, it could quite well likely be so

I have logged an RT ticket and CC'd Peter -http://rt.wikimedia.org/Ticket/Display.html?id=1263

MarkAHershberger added a comment.Via ConduitAug 5 2011, 11:11 AM

lowering priority since this has moved to Ops.

Reedy added a comment.Via ConduitAug 5 2011, 12:06 PM

Ops have made a couple of changes in the last few hours.

Can anyone who has been able to reproduce this, test again and find out if it's still happening?

Thanks!

bzimport added a comment.Via ConduitAug 5 2011, 2:15 PM

ralf wrote:

Yes is still happening!!

bzimport added a comment.Via ConduitAug 5 2011, 2:46 PM

ralf wrote:

one Foto works but more not.

bzimport added a comment.Via ConduitAug 5 2011, 2:51 PM

daniel wrote:

Yep I can confirm that my bots are still getting plenty of 502 on page.put() in pywikipediabot. Doesnt' look like anything has changed yet.

Rillke added a comment.Via ConduitAug 5 2011, 3:05 PM

Yup, using the API with JavaScript throws error 502 and 504 on every 5th request.

Reedy added a comment.Via ConduitAug 5 2011, 3:57 PM

How about now (unfortunately we're having issues reproducing it, this is the simplest way)? It seems one of the api apache application servers was very out of sync, and is in the progress of being fixed... But won't be hit atm

bzimport added a comment.Via ConduitAug 5 2011, 4:08 PM

daniel wrote:

Does none of the admins have toolserver account?! Or any account on a european computer?
Looks better now. I'll keep testing.

Reedy added a comment.Via ConduitAug 5 2011, 4:09 PM

(In reply to comment #18)

Does none of the admins have toolserver account?! Or any account on a european
computer?
Looks better now. I'll keep testing.

We have both. I tried numerous requests yesterday with AWB, and encountered no errors.

bzimport added a comment.Via ConduitAug 5 2011, 4:12 PM

daniel wrote:

Nope, sorry. 502 is back. The test before must have been a fluke (was a short bot run that went thorough without errors)

bzimport added a comment.Via ConduitAug 5 2011, 4:18 PM

daniel wrote:

The frequency of 502 errors is down to about 1 in 10 requests failing. Looks
like something is improving after all.
Can anyone else confirm this?

Prolineserver added a comment.Via ConduitAug 5 2011, 4:30 PM

No, still no improvement for mee.

bzimport added a comment.Via ConduitAug 5 2011, 5:00 PM

neilk wrote:

Side note, I have changed the following messages or pages on Commons to point to Commons:Upload until the issues are resolved.

MediaWiki:Sitenotice
Commons:Upload - commented out {{UploadWizard}} template
MediaWiki:Upload-url
MediaWiki:Upload-url/en

Rillke added a comment.Via ConduitAug 5 2011, 7:13 PM

still not better: 2 new entries (Bugreports) in

http://commons.wikimedia.org/wiki/MediaWiki_talk:AjaxQuickDelete.js

bzimport added a comment.Via ConduitAug 5 2011, 9:06 PM

afeldman wrote:

I was able to track down a few examples in the kennisnet udplog stream.

knsq23.knams.wikimedia.org 3654410 2011-08-05T21:00:08.774 6676 213.221.6.148 TCP_MISS/502 3330 POST http://uk.wikipedia.org/w/api.php CARP/91.198.174.40 text/html - - PythonWikipediaBot/1.0

knsq23.knams.wikimedia.org 3272304 2011-08-05T20:43:08.138 4681 217.187.129.179 TCP_MISS/502 3432 POST http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Xqt&action=submit CARP/91.198.174.40 text/html http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Xqt&action=edit&section=56 - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64;%20rv:5.0)%20Gecko/20100101%20Firefox/5.0

All POST's, all getting hashed via carp to the backend squid on knsq30. A drive is failing on knsq30 (91.198.174.40) and there are possibly other problems - load is many times hire than all other squids. I removed it from the frontend.conf and since deploying, have not seen any more 502's.

Prolineserver added a comment.Via ConduitAug 6 2011, 5:25 AM

Now I don't get the 502 anymore, but instead timeouts:

Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 8 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 16 minutes...
bzimport added a comment.Via ConduitAug 8 2011, 7:12 PM

neilk wrote:

Made Special:UploadWizard default uploader again on Commons (see comment #23)(In reply to comment #23)

Side note, I have changed the following messages or pages on Commons to point
to Commons:Upload until the issues are resolved.

MediaWiki:Sitenotice
Commons:Upload - commented out {{UploadWizard}} template
MediaWiki:Upload-url
MediaWiki:Upload-url/en

Switched these back to use UploadWizard.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.