Page MenuHomePhabricator

Investigate 500s from batch request failures
Open, MediumPublic

Description

From @ssastry,

https://grafana.wikimedia.org/d/000000042/parsoid-http-status-codes?panelId=1&fullscreen&orgId=1&from=now-6M&to=now
The baseline http 500 error rate is usually because of m/w api timeouts, or timeouts on large pages (more common) or other uncommon crashers (which we usually catch, file bugs for and fix).
That periodicity is usually because the same failing pages are regularly crawled (by ex: Google) or edited (for ex: wiki pages that are used as "databases" / lists by bots and editors).
But, as of https://www.mediawiki.org/wiki/Parsoid/Deployments/2018#Wednesday,_Nov._7,_2018_around_1:54_pm_PT:_Deployed_970751a .. that number spiked ... since that baseline bump has persisted till now, we should look at Kibana and see what the http 500 requests are that don't fit the timeout or large page pattern.

Details

Related Gerrit Patches:
mediawiki/services/parsoid : masterReduce the batch size for pageprop requests

Event Timeline

Arlolra created this task.Feb 7 2019, 5:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 7 2019, 5:30 PM

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.02.07/parsoid?id=AWjIsGyznlBds_JjA_-S&_g=h@44136fa
http://localhost:8000/fr.wikipedia.org/v3/page/html/Utilisateur%3AChico75%2Fpages_avec_pourcent/156246032

Seems to consistently fail from,

<p class="text-muted"><code>
  PHP fatal error: <br/>
  request has exceeded memory limit</code></p></div>
</html>

That page wants the page props for 30k links.

Although we're breaking this up into chunks of 500, it looks like we're sending it all over in one large request,
https://github.com/wikimedia/parsoid/blob/master/lib/mw/Batcher.js#L387-L388

Change 488999 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

Change 488999 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

ssastry triaged this task as Medium priority.Jun 10 2019, 8:40 PM
ssastry moved this task from Backlog to Crashers on the Parsoid board.