Page MenuHomePhabricator

VE is not loading on Beta Cluster, getting 503s
Closed, ResolvedPublic

Description

VE is not loading on Beta Cluster, getting 503s

Event Timeline

Ryasmeen created this task.Jun 28 2018, 8:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 28 2018, 8:40 PM
Ryasmeen renamed this task from VE not loading on Beta Cluster, getting 503s to VE is not loading on Beta Cluster, getting 503s.Jun 28 2018, 8:41 PM
Ryasmeen updated the task description. (Show Details)

Looks like RB is timing out trying to connect to parsoid:

krenair@deployment-cache-text04:~$ curl http://deployment-restbase01.deployment-prep.eqiad.wmflabs:7231/en.wikipedia.beta.wmflabs.org/v1/page/html/14thjulyFF
{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/internal_http_error","method":"get","detail":"Error: ESOCKETTIMEDOUT","uri":"http://deployment-parsoid09.deployment-prep.eqiad.wmflabs:8000/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/14thjulyFF/112456"}
Krenair added a comment.EditedJun 28 2018, 11:54 PM

Tried restarting parsoid service on deployment-parsoid09 and then getting the URI above.
Based on tail -f /srv/log/parsoid/main.log | grep -v ChangePropagation it did try wt2html for that page. I haven't managed to get it to repeat that, or do it for other pages.

Deskana triaged this task as High priority.Jun 29 2018, 1:12 PM
cscott added a subscriber: cscott.Jun 29 2018, 3:51 PM

Confirmed that Parsoid is on b068bb51d29e294a4f4a875ae829cca8cf314205 in both prod and beta.
beta:

deployment-tin$ curl http://deployment-parsoid09.deployment-prep.eqiad.wmflabs:8000/_version
{"name":"parsoid","version":"0.9.0","sha":"b068bb51d29e294a4f4a875ae829cca8cf314205"}

and prod:

deployment:~$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; curl "http://$wtp:8000/_version"; echo; done;
Querying wtp1025.eqiad.wmnet: {"name":"parsoid","version":"0.9.0","sha":"b068bb51d29e294a4f4a875ae829cca8cf314205"}
[...etc...]

Tried restarting parsoid service on deployment-parsoid09 and then getting the URI above.
Based on tail -f /srv/log/parsoid/main.log | grep -v ChangePropagation it did try wt2html for that page. I haven't managed to get it to repeat that, or do it for other pages.

That could be an indication that the task queues aren't being cleared fast enough by the workers ... but I'm not sure. If they were full, they would fail hard.

Requests that get handled directly by the server (ie. 302s/404s) were always replied to promptly. I restarted the service and, at least for the moment, it is restored.

Thanks for looking into this!

Seems to be now fixed?

Deskana closed this task as Resolved.Jul 2 2018, 7:54 PM
Deskana claimed this task.

I'll take it.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJul 2 2018, 7:54 PM