Page MenuHomePhabricator

Do not ignore HTTP conditionals in transform REST endpoint.
Closed, ResolvedPublic

Description

ParsoidHandler currently overrides checkPreconditions() to disable HTTP conditionals such as If-Match and If-Non-Match. This method should be deleted to restore the default behavior.

However, we cannot currently start using If-Match headers, since Visual Editor presently sends "weak" eTags, which per the HTTP spec will never match If-Match (they are for use with If-None-Match only). See @ssastry on T238849#5683035.

The reason is that VE is receiving a weak eTag when it loads the page's HTTP. The weak etags originate in two places:
Firstly, Varnish (apparently?!) "weakens" the strong ETag it receives from RESTbase (presumably when applying on-the-fly compression). Secondly, generated by the ParsoidHandler base class, which was implemented to emulate the ETags observed when loading HTML from RESTbase (via Varnish). Both are incorrect: if we want to be able to use the eTag to protect a modifying operation such as an edit, it must be a strong etag. Even for reading, we want the client to get fresh data when the rendering of the page changes in any way.

Approach

Since the issue originates in two places, it needs to be addressed in both:

Firstly, the ParsoidHandler base class should emit strong ETags. This is easily done, but it will not help in WMF production, when responses are looped through RESTbase and Varnish: RESTbase ignores the ETag it receives from the parsoid extension endpoint and generates a new one. Varnish then weakens that tag on the way out.

Secondly, Varnish shouldn'e weaken ETags, for any reason. Ideally, we would use Tranfer-Encoding for on-the-fly compression, which would preserve ETags. This may however not be possible to do, because Transfer-Encoding is not universally supported in clients.

A possible solution would be to change the Visual Editor client to not send the If-Match header, at least not if the ETag is weak. Since we are currently ignoring the If-Match header entirely, this would not make things any worse. And the client side code would be more clear about what semantics it can expect from the server. This doesn't work, because in WMF production, sending If-Match headers with a weak hash does work, since RESTbase allows weak tags to match (contrary to the HTTP spec). We rely on this behavior in production.

One way to work around this would be to have VE supply the etag as part of the payload, rather than using the If-Match header. RESTbase supports embedded etags in a HTML-meta tag using property="mw:TimeUuid".

Further, we could have a "quirk" setting in the REST framework that allows weak etags to match in an If-Match header for specific endpoints.

Context

  1. If-Match and If-None-Match serve very different purposes: If-None-Match is the modern version of If-Modified-Since, and can be used in GET requests to only load a resource if a (significant) update is available. If-Match however is used to protect PUT/POST/DELETE requests with check-and-set semantics: it means the server will nto perform the requested action on the resource if it is no longer exactly the resource that the client was aware of.
  2. A client should never generate or modify an etag. The etag always comes from the etag header of an HTTP response from the server, and is sent back to that same server later.
  3. An eTag cannot be converted between weak and strong. Whether it is weak or strong is inherent in the way the tag is computed: it either guarantees byte-by-byte equality, or it doesn't. If it doesn't, it is interpreted to represent semantic equivalence in the intended usae context of the resources.
  4. See (somewhat outdated) T303370 for the parsoid part of things.
  5. HTTP considers different Content-Encodings of the same resources to be different entities which must have different ETags. See https://en.wikipedia.org/wiki/HTTP_ETag#ETag_generation and https://www.rfc-editor.org/rfc/rfc9110.html#name-example-entity-tags-varying.
  6. HTTP servers and proxies often use Content-Encoding instead of the more appropriate Transfer-Encoding because Content-Encoding is better supported by clients. See discussion at https://stackapps.com/a/3655 and <https://en.wikipedia.org/wiki/HTTP_compression and https://stackoverflow.com/questions/11641923/transfer-encoding-gzip-vs-content-encoding-gzip>.
  7. In theory, proxies that apply a Content-Encoding (not Transfer-Encoding) on the fly must assign a new ETag, and must return status 214 ("Warning: transformation applied"). The latter appears to be uncommon though.

Data

Response headers received from https://en.wikipedia.org/api/rest_v1/page/html/User:Arlolra%2Fsandbox (via RESTbase and Varnish, with irrelevant stuff removed):

HTTP/2 200 OK
content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.4.0"
cache-control: s-maxage=1209600, max-age=0, must-revalidate
access-control-expose-headers: etag
content-location: https://en.wikipedia.org/api/rest_v1/page/html/User%3AArlolra%2Fsandbox
server: restbase1027
date: Fri, 17 Jun 2022 09:32:36 GMT
etag: W/"1080007918/fed67090-af9f-11ec-a9b7-25dd8b9c1c83"
content-encoding: gzip
vary: Accept, Accept-Encoding
age: 15
x-cache: cp3058 miss, cp3058 hit/2
x-cache-status: hit-front
server-timing: cache;desc="hit-front", host;desc="cp3058"
accept-ranges: bytes
content-length: 721

Note the weak ETag and content-encoding: gzip.

Event Timeline

daniel added a subscriber: DAlangi_WMF.
This comment was removed by daniel.
daniel renamed this task from Visual Editor should send strong eTags in If-Match headers to Do not ignore HTTP conditionals in transform REST endpoint..Jun 15 2022, 5:47 PM
daniel updated the task description. (Show Details)

Change 805886 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Parsoid API: emit strong eTags

https://gerrit.wikimedia.org/r/805886

The issue originates in I5edfa8aa54dedaec418579bbef37aa040170d09b, which sais: ETag harmonisation Add custom weak ETag to page requests in the format used by RESTBase.

That patch was written to address T235462 and T235460, but neither of them asked for a weak eTag. Making it weak seems to have been a mistake, which caused T238849, which in turn led to checkConditionals() to be overwritten as a work-around.

On Slack, @Arlolra pointed out that it may be Varnish adding the W/ prefix to "weaken" the etag. Per the varnish docs, this might indeed be the case: https://varnish-cache.org/docs/6.0/users-guide/compression.html

For discussion of this fact elsewhere, see https://github.com/wikimedia/restbase/pull/1163#issue-465287958 and https://github.com/wikimedia/restbase/pull/968#issuecomment-373152546

So let me see if I'm getting this right: VE gets HTML via Varnish and RESTbase. It's getting an ETag generated by RESTbase, and "weakened" by Varnish.

It uses this ETag in an If-Match header sent to the transform endpoint, which would always fail because If-Match never works for weak ETags. To work around this, we disable processing of HTTP conditionals for the transform endpoint (which is against the HTTP spec).

This leads me to the following questions:

  • Do I understand correctly that the ETag generated by ParsoidHandler never makes it to VE when it's looped through RESTbase? That is, fixing the ETag generated by ParsoidHandler will help for third parties, but not for WMF?
  • Does Varnish "weaken" the ETag in order to apply Content-Encoding, or for some other reason? Can this be changed?
  • If we are ignoring the If-Match header anyway, can we just stop VE from sending it? The we wouldn't need the server side workaround. VE could be smart and still send If-Match if it has a strong ETag, and know not to send it if the ETag is weak.

Change 806413 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Require the page endpoint to return strong Etags

https://gerrit.wikimedia.org/r/806413

Change 805886 merged by jenkins-bot:

[mediawiki/core@master] Parsoid API: emit strong eTags

https://gerrit.wikimedia.org/r/805886

Change 807539 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

Change 807601 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Add tests for if-match header handling.

https://gerrit.wikimedia.org/r/807601

Change 807539 merged by jenkins-bot:

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

Further, we could have a "quirk" setting in the REST framework that allows weak etags to match in an If-Match header for specific endpoints.

Going with this solution for now. It's against the spec, but should be safe anyway: the backend knows that the etag is "strong", because it generated it in the first place. So it can ignore the fact that a "weak" marker got added along the way.

Change 806413 abandoned by Daniel Kinzler:

[mediawiki/services/parsoid@master] Require the page endpoint to return strong Etags

Reason:

Per Arlo

https://gerrit.wikimedia.org/r/806413

Change 815250 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] TransformHandler: do not ignore ETags.

https://gerrit.wikimedia.org/r/815250

Change 807601 abandoned by Daniel Kinzler:

[mediawiki/services/parsoid@master] Add tests for if-match header handling.

Reason:

see I59d7df3908dfea609dcc5725399136f75de9eb74

https://gerrit.wikimedia.org/r/807601

Change 820409 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidHandler: do not emit etag for wt2html

https://gerrit.wikimedia.org/r/820409

Change 820409 merged by jenkins-bot:

[mediawiki/core@master] ParsoidHandler: do not emit etag for wt2html

https://gerrit.wikimedia.org/r/820409

Change 815250 merged by jenkins-bot:

[mediawiki/services/parsoid@master] TransformHandler: do not ignore ETags.

https://gerrit.wikimedia.org/r/815250

Change 827533 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a20

https://gerrit.wikimedia.org/r/827533

Change 827533 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a20

https://gerrit.wikimedia.org/r/827533