Investigate 500s from batch request failures
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Arlolra
	Feb 7 2019, 5:30 PM

Description

From @ssastry,

https://grafana.wikimedia.org/d/000000042/parsoid-http-status-codes?panelId=1&fullscreen&orgId=1&from=now-6M&to=now

The baseline http 500 error rate is usually because of m/w api timeouts, or timeouts on large pages (more common) or other uncommon crashers (which we usually catch, file bugs for and fix).

That periodicity is usually because the same failing pages are regularly crawled (by ex: Google) or edited (for ex: wiki pages that are used as "databases" / lists by bots and editors).

But, as of https://www.mediawiki.org/wiki/Parsoid/Deployments/2018#Wednesday,_Nov._7,_2018_around_1:54_pm_PT:_Deployed_970751a .. that number spiked ... since that baseline bump has persisted till now, we should look at Kibana and see what the http 500 requests are that don't fit the timeout or large page pattern.

Details

	Subject	Repo	Branch	Lines +/-
	Reduce the batch size for pageprop requests	mediawiki/services/parsoid	master	+14 -12

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		ssastry	T215537 Investigate 500s from batch request failures
		Resolved		cscott	T215110 ParsoidBatchAPI timeout on frwikisource due to Score extension

Event Timeline

Arlolra created this task.Feb 7 2019, 5:30 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 7 2019, 5:30 PM

Arlolra added a subtask: T215110: ParsoidBatchAPI timeout on frwikisource due to Score extension.Feb 7 2019, 5:30 PM

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.02.07/parsoid?id=AWjIsGyznlBds_JjA_-S&_g=h@44136fa
http://localhost:8000/fr.wikipedia.org/v3/page/html/Utilisateur%3AChico75%2Fpages_avec_pourcent/156246032

Seems to consistently fail from,

<p class="text-muted"><code>
  PHP fatal error: <br/>
  request has exceeded memory limit</code></p></div>
</html>

That page wants the page props for 30k links.

Although we're breaking this up into chunks of 500, it looks like we're sending it all over in one large request,
https://github.com/wikimedia/parsoid/blob/master/lib/mw/Batcher.js#L387-L388

Change 488999 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

gerritbot added a project: Patch-For-Review.Feb 7 2019, 7:24 PM

Change 488999 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Reduce the batch size for pageprop requests

https://gerrit.wikimedia.org/r/488999

Mentioned in SAL (#wikimedia-operations) [2019-02-11T21:32:55Z] <arlolra> Updated Parsoid to rGPARb4b9603ed153 (T208901, T215537, T213468, T215638)

Stashbot mentioned this in T213468: Parsoid section IDs don't correspond to PHP section IDs when headings are transcluded.Feb 11 2019, 9:32 PM

Stashbot mentioned this in T215638: List tokens use special-cased "bullets" property instead of stuffing it in attribs like other tokens.

ssastry triaged this task as Medium priority.Jun 10 2019, 8:40 PM

ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

I am going to mark this resolved (vs declined since we seem to have deployed a few patches against this task) given that Parsoid/JS is no longer in use.

cscott closed subtask T215110: ParsoidBatchAPI timeout on frwikisource due to Score extension as Resolved.Apr 17 2020, 4:23 PM

Investigate 500s from batch request failuresClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Investigate 500s from batch request failures
Closed, ResolvedPublic
Actions

Related Objects
Search...