VE is not loading on Beta Cluster, getting 503s
Description
Related Objects
- Mentioned In
- T214099: Stress test Parsoid's HTTP API
T204624: Parsoid is misbehaving in Beta cluster
T198348: Quibble CI jobs time out after 30min due to instance stalling at "npm install parse" step - Mentioned Here
- rGPARb068bb51d29e: Ensure Parsoid doesn't crash on unimplemented/invalid language conversions
Event Timeline
Looks like RB is timing out trying to connect to parsoid:
krenair@deployment-cache-text04:~$ curl http://deployment-restbase01.deployment-prep.eqiad.wmflabs:7231/en.wikipedia.beta.wmflabs.org/v1/page/html/14thjulyFF
{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/internal_http_error","method":"get","detail":"Error: ESOCKETTIMEDOUT","uri":"http://deployment-parsoid09.deployment-prep.eqiad.wmflabs:8000/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/14thjulyFF/112456"}Tried restarting parsoid service on deployment-parsoid09 and then getting the URI above.
Based on tail -f /srv/log/parsoid/main.log | grep -v ChangePropagation it did try wt2html for that page. I haven't managed to get it to repeat that, or do it for other pages.
Confirmed that Parsoid is on b068bb51d29e294a4f4a875ae829cca8cf314205 in both prod and beta.
beta:
deployment-tin$ curl http://deployment-parsoid09.deployment-prep.eqiad.wmflabs:8000/_version
{"name":"parsoid","version":"0.9.0","sha":"b068bb51d29e294a4f4a875ae829cca8cf314205"}and prod:
deployment:~$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; curl "http://$wtp:8000/_version"; echo; done;
Querying wtp1025.eqiad.wmnet: {"name":"parsoid","version":"0.9.0","sha":"b068bb51d29e294a4f4a875ae829cca8cf314205"}
[...etc...]That could be an indication that the task queues aren't being cleared fast enough by the workers ... but I'm not sure. If they were full, they would fail hard.
Requests that get handled directly by the server (ie. 302s/404s) were always replied to promptly. I restarted the service and, at least for the moment, it is restored.