Page MenuHomePhabricator

504 timeout with Parsoid service and Visual Editor
Open, Needs TriagePublic

Description

Open https://en.wikipedia.org/wiki/The_World%27s_Billionaires, then click to edit source 2020 headline, add some text, try to switch to Visual Editor -> the result is "Error contacting the Parsoid/RESTBase server (HTTP 504)".
After clicking "try again" the switch to Visual Editor is successful, but the added text is lost.

Event Timeline

Arlolra added a subscriber: Arlolra.

Maybe this has to do with the page being semi-protected?

Same problem at https://en.wikipedia.org/wiki/List_of_largest_power_stations (not a semi-protected page). Maybe page length is relevant?

ppelberg added subscribers: Esanders, matmarex, ppelberg.

Same problem at https://en.wikipedia.org/wiki/List_of_largest_power_stations (not a semi-protected page). Maybe page length is relevant?

Just chatted with @matmarex + @Esanders about this and they suspect it has something to do with the length of the page as @Jklamo suggested above.

@Arlolra, three resulting questions for you:

  1. Do you think the length of the page could explain the error JKlamo is experiencing?
  2. Do you think this ticket is a duplicate of T244609?
  3. If the answer you come up with to "1." ends up being "yes" what might be involved with fixing this issue?

Do you think the length of the page could explain the error JKlamo is experiencing?

Yes, in so much as page length is a proxy for page complexity and parse time.

Do you think this ticket is a duplicate of T244609?

Yup

If the answer you come up with to "1." ends up being "yes" what might be involved with fixing this issue?

Well, the first thing to note is what's common between this task and T244609 is that they both are requesting Parsoid to parse wikitext, which is normally something that we only think of RESTBase as doing.

On OfficeWiki, there's no RESTBase in the middle because it's a private wiki. Here, when switching from source to visual editing, Parsoid has to parse the edited wikitext since it's different from what's been cached.

Normally, VE's interaction with Parsoid is pretty fast. Either it's pulling something from RESTBase, which is cached, or it's asking for serialization, which is fast enough.

The first thing to check is the timeout on the wgVirtualRestConfig, which is 360
https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L2561

There's a bug though and that just gets ignored and we fall back to 25 for the $wgHTTPTimeout
https://github.com/wikimedia/mediawiki/blob/master/includes/DefaultSettings.php#L9616

Ok, great, but most of the pages above parse well under that. See the OfficeWiki page parsing in 10s,
https://phabricator.wikimedia.org/T244609#6034307

So, the timeout isn't from the setting on curl made on the VirtualRESTServiceClient. Curisouly, the timeout happens in 8s on OfficeWiki,
https://phabricator.wikimedia.org/T244609#5995719

When RESTBase queries Parsoid, it uses parsoid-async,
https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/restbase.pp#L86-L89

which has a timeout of 120s, no problem there,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L190-L198

However, parsoid-php only has 8s to respond,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L6-L14
which is probably where OfficeWiki is timing out.

Further, if you test a few of the pages listed above for enwiki, the timeout is 10s, not 8. This is because the parse request in these cases is proxied through RESTBase, which although queries Parsoid with parsoid-async, nevertheless has its own timeout of 10s to respond to the VirtualRestClient,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L60-L66

Good sleuthing! Didn't realize envoy and parsoid-async timeouts were involved in the picture. That will let us proceed with T244609 where I had landed at the wrong place assuming the problem was in Parsoid's PEG.

Change 699425 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/puppet@production] Bump envoy timeout for parsoid-php

https://gerrit.wikimedia.org/r/699425

Change 699434 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Use restbase-for-services for VE's VirtualRestClient calls

https://gerrit.wikimedia.org/r/699434

Well, the first thing to note is...

Good sleuthing!

+1, @ssastry – we appreciate this quick and thorough response, @Arlolra!

Gerrit is leading me to assume you all (Parsing) have code review covered for this.

Please let us know if you would value our help with QA.

Change 700077 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Switch to using parsoid-async for direct VirtualRestClient connects

https://gerrit.wikimedia.org/r/700077