Page MenuHomePhabricator

Post switchover to Parsoid/PHP on private wikis, VE edit saves on officewiki responded with a HTTP 412
Closed, ResolvedPublic

Description

We attempted a switchover of officewiki from Parsoid/JS to Parsoid/PHP.

However, saves from VE edits weren't going through. They failed with a 'Error contacting the Parsoid/RESTBase server (HTTP 412)' error. This error is returned by the REST API checks.

Need to investigate what the incompatibility is this configuration.

Event Timeline

ssastry triaged this task as High priority.Nov 21 2019, 6:18 PM
ssastry created this task.

@Pchelolo's investigation in T236837#5621491 is probably relevant here. RESTBase was passing through the header there and stopped doing so with his fix. But, in the private wiki scenario, VE's header is triggering the 412 now.

@matmarex, @Esanders It looks like in the Parsoid/PHP and Mediawiki REST API scenario, VE and other clients should send the if-unmodified-since header conditionally based on whether the target wiki has its content cached or not.

The if-unmodified-since in RESTBase is used for an optimization, but it almost never kicks in for external requests, mostly it's relevant for change-prop. So I think we can just remove this header from the client request

Given that this is triggered on a save, and since the parsed HTML doesn't change for that revision (guaranteed by stashing), the if-unmodified-since is trivially satisfied always. Edit conflict resolution mechanism outside this process kicks in for real edit conflicts anyway.

Alternatively,

  1. RESTBase sends Parsoid the parse/storage timestamp of the HTML, and Parsoid uses it to implement the getLastModified() ( https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler.php#L191-L203 ) in the ParsoidHandler
  2. Parsoid sets the timestamp in a meta-tag in the <head> section and assuming VE posts the full doc, and not just the <body>, Parsoid extracts it from there and implements getLastModifed()

Solution #1 is probably needed in the long-term anyway. But for now, it seems safe for VE to remove it based on what @Pchelolo is observing above about its usage in RESTBase. Solution #2 doesn't work because clients may strip out the header (VE sometimes requests only the <body>)

Ignore all that noise. VE doesn't send us the if-unmodified-since header at all. git grep -i 'if-unmodified-since' in the VE repo gives me zero hits. I went down the wrong rabbit hole.

[ NOTE from June 15, 2022: I noted this in T303370, but in my original comment below, I made the incorrect claim that VE doesn't send the etag but in the header dump I posted, the etag is right there with the if-match header!! There is still useful info here, so I am not going around messing with the comment. ]

Okay, I dug around a bit on my local wiki with VE, and I found the following problems:

{"Host":["localhost"],"if-match":["W\\/\\"3657\\/bd1cafd8-0ca9-11ea-98b2-02422bea911b\\""],"accept":["text\\/html; charset=utf-8; profile=\\"https:\\/\\/www.mediawiki.org\\/wiki\\/Specs\\/HTML\\/2.0.0\\""],"accept-language":["en"],"user-agent":["VisualEditor-MediaWiki\\/1.35.0-alpha"],"api-user-agent":["VisualEditor-MediaWiki\\/1.35.0-alpha"],"Content-Length":["100759"],"Content-Type":["multipart\\/form-data; boundary=------------------------d7dd0c4d32758482"],"Expect":["100-continue"]}

So, two things:

  1. With RESTBase in the middle, all this is handled by RESTBase for us. But, the above investigation also means that RESTBase doesn't implement header validation or there are bugs in that implementation.
  2. Parsoid/JS never implemented REST header validation whereas Parsoid/PHP implicitly implements these validations.

It makes sense to fix all these issues. But, given that Parsoid/JS never implemented header validation, for the immediate term, we can disable that in Parsoid/PHP as well. The REST API framework provides us this ability ( https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler.php#L111-L133 ) by stating that this may be overridden and enabling it via a public method.

@tstarling, can you review this to verify that my analysis is correct? I'll meanwhile go ahead and override the precondition check method to disable it in Parsoid/PHP.

Yes, looks correct to me. Ignoring an If-Match header is not compliant with RFC 7232: "An origin server that receives an If-Match header field MUST evaluate the condition prior to performing the method." If If-Match support is really needed for correctness, a strong ETag should be used, since the only difference between a strong and a weak ETag is that a strong ETag may pass If-Match.

Change 552364 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Unbreak the VE-without-RESTBase scenario: Disable header checks

https://gerrit.wikimedia.org/r/552364

Change 552364 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Unbreak the VE-without-RESTBase scenario: Disable header checks

https://gerrit.wikimedia.org/r/552364

ssastry claimed this task.

If the next deploy doesn't fix it, we can reopen this.

The issue originates in I5edfa8aa54dedaec418579bbef37aa040170d09b, which sais: ETag harmonisation Add custom weak ETag to page requests in the format used by RESTBase.

EDIT: this appears to not be quite true: the above patch only spreads the issue by emulating the ETags observed when loading HTML e.g. https://en.wikipedia.org/api/rest_v1/page/html/Earth: we are seeing weak etags here, even though RESTbase emits strong ETags (the ETags generated by ParsoidHandler are ignored by RESTbase). The weak marker is supposedly added by Varnish when applying on-the-fly compression.

Change 807539 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

Change 807539 merged by jenkins-bot:

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

This comment was removed by daniel.

@daniel Did you mean to post this on T333402?

Yes, thank you!