Parsoid timing out or failing when trying to parse specific user page
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Joe
	Jan 18 2017, 8:45 AM

Description

Parsoid load went up significantly this morning in eqiad (the cluster that serves the live traffic), and I traced the issue to the inability of parsoid to parse most revisions of a specific talk page on cebwiki, resulting often in timeouts or the worker dying.

This seems all to be caused by REST-API-Crawler-Google/1.0, which is trying to parse all revisions of the page.

An extract from parsoid logs:

"Timed out processing: cebwiki/Gumagamit:Lsjbot/Kartrutor2?oldid=12301712"
"worker 25444 died (1), restarting."
"Timed out processing: cebwiki/Gumagamit:Lsjbot/Kartrutor2?oldid=12301924"
"worker 25274 died (1), restarting."
"Timed out processing: cebwiki/Gumagamit:Lsjbot/Kartrutor2?oldid=12301844"
"worker 25434 died (1), restarting."
"Timed out processing: cebwiki/Gumagamit:Lsjbot/Kartrutor2?oldid=12301924"
"worker 25464 died (1), restarting."
"Timed out processing: cebwiki/Gumagamit:Lsjbot/Kartrutor2?oldid=12301963"
"worker 25133 died (1), restarting."

The page is long and makes extensive usage of a lua module, https://ceb.wikipedia.org/wiki/Module:KML

While the issue is under control, more or less, in terms of load of the cluster, this is causing workers to die and thus some requests might be in-flight and fail for real users.

This should thus be treated with the highest priority.

Event Timeline

Joe created this task.Jan 18 2017, 8:45 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2017, 8:45 AM

Joe renamed this task from Parsoid unable to parse specific user page to Parsoid timing out or failing when trying to parse specific user page.Jan 18 2017, 8:53 AM

Joe claimed this task.

Joe added a project: User-Joe.

Isolating a single request, I see that most of the time is spent in executing

v8::internal::VisitWeakList<v8::internal::JSFunction>

and that parsing, even when successful, requires ~ 180 seconds.

I am now trying to determine why the worker dies.

elukey subscribed.Jan 18 2017, 9:18 AM

Strace gives little more information, besides the fact for each of these pages parsoid does hundreds of preprocessing requests to the MW API. Maybe some recursion limit is reached?

The request limit in Parsoid is set to 110s, after which the worker commits suicide.

I will blacklist this specific title in RESTBase for now.

The PHP parser also gives up with lots of errors like this on the page:

...
S08W039 Lua error: too many expensive function calls.
...

Anyway, looks like Parsoid is missing some resource limit / state to detect this scenario. But, bot-driven pages (which are basically proxies for a database table) are usually the ones that give Parsoid trouble.

In T155618#2949470, @mobrovac wrote:

I will blacklist this specific title in RESTBase for now.

Marco is it something that Ops could do in case of fire? If so, is the procedure written down somewhere?

@elukey apparently this needs a code deploy, which means accepting a pull request on github (sic) where not everyone from ops has the ability to merge a PR (I do as I'm an admin of the wikimedia github org, but YMMV), then you need to check that into the gerrit-based deploy repo, then restbase uses some ansible recipe (sic, again) to be deployed instead of scap3 or trebuchet.

So, we definitely need someone from the services team doing it.

The correct way to handle this would be, of course, to allow ops to control at least part of the blacklist via puppet, or to standardize the deployment process at least.

I remember there were talks about moving restbase to scap3, but I don't think that has happened until now.

Joe closed this task as Resolved.Jan 22 2017, 10:38 AM

Joe moved this task from Backlog to Doing on the User-Joe board.

Parsoid timing out or failing when trying to parse specific user pageClosed, ResolvedPublicActions

Description

Event Timeline

Parsoid timing out or failing when trying to parse specific user page
Closed, ResolvedPublic
Actions