Page MenuHomePhabricator

Parsoid is misbehaving in Beta cluster
Closed, ResolvedPublic

Description

Can't load VE on Beta cluster for any pages, showing error 500

Details

Related Gerrit Patches:
mediawiki/services/change-propagation/deploy : masterDecrease beta cluster concurrency to 1.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2018, 10:59 PM
Pchelolo added a subscriber: Pchelolo.

Seems like Parsoid in beta cluster always times out, because the MW API is not configured properly - both mwApiServer and defaultAPIProxyURI are empty, however I'm not sure that's indeed incorrect - parsoid in beta is configured quite differently from parsoid in production

After restarting Parsoid in beta as an attempt to switch on trace logging it actually started to return the results for test pages, but for some reason, even for very simple pages, Parsoid in beta is extremely slow.. Not sure what is it doing.

Thanks @Pchelolo for looking into this!

Krenair renamed this task from Can't load VE on Beta cluster to Parsoid is misbehaving in Beta cluster .Sep 18 2018, 10:28 AM
Krenair added a subscriber: Krenair.
cscott added a subscriber: cscott.Sep 18 2018, 3:22 PM

The last Parsoid deploy to beta was Wed or Thursday of last week. Assuming this problem started recently it's probably not a code or configuration change on the Parsoid end...

Probably the same thing that caused T198421 cropping up again

The last Parsoid deploy to beta was Wed or Thursday of last week.

According to the logs, it was on Wednesday.

After restarting Parsoid in beta as an attempt to switch on trace logging it actually started to return the results for test pages,

As in T198421#4326131, I think this is a case where the workers are being overwhelmed. A 503 would indicate we're hitting maxConcurrentCalls
From https://github.com/wikimedia/parsoid/blob/master/lib/api/apiUtils.js#L179

The default is 5x the number of workers (3), which doesn't add up to much
https://github.com/wikimedia/parsoid/blob/master/lib/config/ParsoidConfig.js#L57

Looking at the request logs around the time this was filed, I see

{"name":"parsoid","hostname":"deployment-parsoid09","pid":24,"level":30,"logType":"info","wiki":"enwiki","title":"Template:Other_people5","oldId":51305,"reqId":"831c03bc-11db-4c1f-86cc-296ca3a0c95c","userAgent":"ChangePropagation/WMF","msg":"completed wt2html in 650ms","longMsg":"completed wt2html in 650ms","levelPath":"info","time":"2018-09-17T23:12:07.442Z","v":0}

and a lot of other reparses of templates.

There was this change to Template:Documentation, which probably set off ChangeProp
https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Template%3ADocumentation&type=revision&diff=384325&oldid=350068

That flood has ended and Parsoid is responsive again.

Change 461181 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/deploy@master] Decrease beta cluster concurrency to 1.

https://gerrit.wikimedia.org/r/461181

Change 461181 merged by Ppchelko:
[mediawiki/services/change-propagation/deploy@master] Decrease beta cluster concurrency to 1.

https://gerrit.wikimedia.org/r/461181

Pchelolo closed this task as Resolved.Sep 18 2018, 7:44 PM
Pchelolo claimed this task.

VE works properly in Beta now. After decreasing change-prop concurrency level I've made some edits to test templates I have there transcluded in a fair number of pages, and, as expected, actual re-renders got paced, and VE continued to work throughout the experiment.

I consider this done and resolving, please reopen if happens again.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptSep 18 2018, 7:44 PM