Page MenuHomePhabricator

Investigate 500s from batch request failures
Closed, ResolvedPublic

Description

From @ssastry,

https://grafana.wikimedia.org/d/000000042/parsoid-http-status-codes?panelId=1&fullscreen&orgId=1&from=now-6M&to=now

The baseline http 500 error rate is usually because of m/w api timeouts, or timeouts on large pages (more common) or other uncommon crashers (which we usually catch, file bugs for and fix).

That periodicity is usually because the same failing pages are regularly crawled (by ex: Google) or edited (for ex: wiki pages that are used as "databases" / lists by bots and editors).

But, as of https://www.mediawiki.org/wiki/Parsoid/Deployments/2018#Wednesday,_Nov._7,_2018_around_1:54_pm_PT:_Deployed_970751a .. that number spiked ... since that baseline bump has persisted till now, we should look at Kibana and see what the http 500 requests are that don't fit the timeout or large page pattern.

Event Timeline

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.02.07/parsoid?id=AWjIsGyznlBds_JjA_-S&_g=h@44136fa
http://localhost:8000/fr.wikipedia.org/v3/page/html/Utilisateur%3AChico75%2Fpages_avec_pourcent/156246032

Seems to consistently fail from,

<p class="text-muted"><code>
  PHP fatal error: <br/>
  request has exceeded memory limit</code></p></div>
</html>

That page wants the page props for 30k links.

Although we're breaking this up into chunks of 500, it looks like we're sending it all over in one large request,
https://github.com/wikimedia/parsoid/blob/master/lib/mw/Batcher.js#L387-L388

Change 488999 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

Change 488999 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

ssastry triaged this task as Medium priority.Jun 10 2019, 8:40 PM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.
ssastry claimed this task.

I am going to mark this resolved (vs declined since we seem to have deployed a few patches against this task) given that Parsoid/JS is no longer in use.