Page MenuHomePhabricator

504 timeout with Parsoid service and Visual Editor
Closed, ResolvedPublic

Description

Open https://en.wikipedia.org/wiki/The_World%27s_Billionaires, then click to edit source 2020 headline, add some text, try to switch to Visual Editor -> the result is "Error contacting the Parsoid/RESTBase server (HTTP 504)".
After clicking "try again" the switch to Visual Editor is successful, but the added text is lost.

Event Timeline

Arlolra subscribed.

Maybe this has to do with the page being semi-protected?

Same problem at https://en.wikipedia.org/wiki/List_of_largest_power_stations (not a semi-protected page). Maybe page length is relevant?

Same problem at https://en.wikipedia.org/wiki/List_of_largest_power_stations (not a semi-protected page). Maybe page length is relevant?

Just chatted with @matmarex + @Esanders about this and they suspect it has something to do with the length of the page as @Jklamo suggested above.

@Arlolra, three resulting questions for you:

  1. Do you think the length of the page could explain the error JKlamo is experiencing?
  2. Do you think this ticket is a duplicate of T244609?
  3. If the answer you come up with to "1." ends up being "yes" what might be involved with fixing this issue?

Do you think the length of the page could explain the error JKlamo is experiencing?

Yes, in so much as page length is a proxy for page complexity and parse time.

Do you think this ticket is a duplicate of T244609?

Yup

If the answer you come up with to "1." ends up being "yes" what might be involved with fixing this issue?

Well, the first thing to note is what's common between this task and T244609 is that they both are requesting Parsoid to parse wikitext, which is normally something that we only think of RESTBase as doing.

On OfficeWiki, there's no RESTBase in the middle because it's a private wiki. Here, when switching from source to visual editing, Parsoid has to parse the edited wikitext since it's different from what's been cached.

Normally, VE's interaction with Parsoid is pretty fast. Either it's pulling something from RESTBase, which is cached, or it's asking for serialization, which is fast enough.

The first thing to check is the timeout on the wgVirtualRestConfig, which is 360
https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L2561

There's a bug though and that just gets ignored and we fall back to 25 for the $wgHTTPTimeout
https://github.com/wikimedia/mediawiki/blob/master/includes/DefaultSettings.php#L9616

Ok, great, but most of the pages above parse well under that. See the OfficeWiki page parsing in 10s,
https://phabricator.wikimedia.org/T244609#6034307

So, the timeout isn't from the setting on curl made on the VirtualRESTServiceClient. Curisouly, the timeout happens in 8s on OfficeWiki,
https://phabricator.wikimedia.org/T244609#5995719

When RESTBase queries Parsoid, it uses parsoid-async,
https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/restbase.pp#L86-L89

which has a timeout of 120s, no problem there,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L190-L198

However, parsoid-php only has 8s to respond,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L6-L14
which is probably where OfficeWiki is timing out.

Further, if you test a few of the pages listed above for enwiki, the timeout is 10s, not 8. This is because the parse request in these cases is proxied through RESTBase, which although queries Parsoid with parsoid-async, nevertheless has its own timeout of 10s to respond to the VirtualRestClient,
https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L60-L66

Good sleuthing! Didn't realize envoy and parsoid-async timeouts were involved in the picture. That will let us proceed with T244609 where I had landed at the wrong place assuming the problem was in Parsoid's PEG.

Change 699425 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/puppet@production] Bump envoy timeout for parsoid-php

https://gerrit.wikimedia.org/r/699425

Change 699434 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Use restbase-for-services for VE's VirtualRestClient calls

https://gerrit.wikimedia.org/r/699434

Well, the first thing to note is...

Good sleuthing!

+1, @ssastry – we appreciate this quick and thorough response, @Arlolra!

Gerrit is leading me to assume you all (Parsing) have code review covered for this.

Please let us know if you would value our help with QA.

Change 700077 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Switch to using parsoid-async for direct VirtualRestClient connects

https://gerrit.wikimedia.org/r/700077

Change 699425 merged by Legoktm:

[operations/puppet@production] Bump envoy timeout for parsoid-php

https://gerrit.wikimedia.org/r/699425

Gerrit is leading me to assume you all (Parsing) have code review covered for this.

Yes, thanks, I'll work with ops to get the patches deployed

@Legoktm just merged the fix for OfficeWiki and, after a puppet run, that seems to have been resolved,
https://office.wikimedia.org/w/index.php?title=Contact_list&type=revision&diff=295727&oldid=295632&diffmode=source

Change 701172 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/puppet@production] Bump envoy timeout for restbase

https://gerrit.wikimedia.org/r/701172

Change 701172 merged by Legoktm:

[operations/puppet@production] Bump envoy timeout for restbase

https://gerrit.wikimedia.org/r/701172

Change 700077 abandoned by Arlolra:

[operations/mediawiki-config@master] Switch to using parsoid-async for direct VirtualRestClient connects

Reason:

https://gerrit.wikimedia.org/r/700077

Please let us know if you would value our help with QA.

With the restbase envoy timeout now bumped, I tested the above cases,

https://en.wikipedia.org/wiki/The_World%27s_Billionaires
https://en.wikipedia.org/wiki/List_of_largest_power_stations

These two parse in just over 10s, well within the $wgHTTPTimeout and so are fixed by the new timeout, 30s

https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory

This one, however, takes closer to 30s for Parsoid to parse and hits the 25s of $wgHTTPTimeout. The evidence for this is that you see the Parsoid/RESTBase server: (curl error: 28) Timeout was reached instead of the (HTTP 504).

Is 25s good enough or should you be able to switch from source to visual editing on any page restbase is serving?

If the latter, see the abandoned patches above that switch to using parsoid-async and restbase-for-services endpoints, which have appropriate timeouts for that. But note that I filed T285445 for maybe consolidating all these services.

Also, keep in mind that the timeout in $wgVirtualRestConfig is meaningless and should either be removed as distracting or fixed in MediaWiki to do what it's supposed to do and override $wgHTTPTimeout,
https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L2557

Change 699434 abandoned by Arlolra:

[operations/mediawiki-config@master] Use restbase-for-services for VE's VirtualRestClient calls

Reason:

https://gerrit.wikimedia.org/r/699434

Also, keep in mind that the timeout in $wgVirtualRestConfig is meaningless and should either be removed as distracting or fixed in MediaWiki to do what it's supposed to do and override $wgHTTPTimeout,

I filed T285478 for that

Arlolra claimed this task.