Do not ignore HTTP conditionals in transform REST endpoint.
Closed, ResolvedPublic
Actions

Description

ParsoidHandler currently overrides checkPreconditions() to disable HTTP conditionals such as If-Match and If-Non-Match. This method should be deleted to restore the default behavior.

However, we cannot currently start using If-Match headers, since Visual Editor presently sends "weak" eTags, which per the HTTP spec will never match If-Match (they are for use with If-None-Match only). See @ssastry on T238849#5683035.

The reason is that VE is receiving a weak eTag when it loads the page's HTTP. The weak etags originate in two places:
Firstly, Varnish (apparently?!) "weakens" the strong ETag it receives from RESTbase (presumably when applying on-the-fly compression). Secondly, generated by the ParsoidHandler base class, which was implemented to emulate the ETags observed when loading HTML from RESTbase (via Varnish). Both are incorrect: if we want to be able to use the eTag to protect a modifying operation such as an edit, it must be a strong etag. Even for reading, we want the client to get fresh data when the rendering of the page changes in any way.

Approach

Since the issue originates in two places, it needs to be addressed in both:

Firstly, the ParsoidHandler base class should emit strong ETags. This is easily done, but it will not help in WMF production, when responses are looped through RESTbase and Varnish: RESTbase ignores the ETag it receives from the parsoid extension endpoint and generates a new one. Varnish then weakens that tag on the way out.

Secondly, Varnish shouldn'e weaken ETags, for any reason. Ideally, we would use Tranfer-Encoding for on-the-fly compression, which would preserve ETags. This may however not be possible to do, because Transfer-Encoding is not universally supported in clients.

A possible solution would be to change the Visual Editor client to not send the If-Match header, at least not if the ETag is weak. Since we are currently ignoring the If-Match header entirely, this would not make things any worse. And the client side code would be more clear about what semantics it can expect from the server. This doesn't work, because in WMF production, sending If-Match headers with a weak hash does work, since RESTbase allows weak tags to match (contrary to the HTTP spec). We rely on this behavior in production.

One way to work around this would be to have VE supply the etag as part of the payload, rather than using the If-Match header. RESTbase supports embedded etags in a HTML-meta tag using property="mw:TimeUuid".

Further, we could have a "quirk" setting in the REST framework that allows weak etags to match in an If-Match header for specific endpoints.

Context

If-Match and If-None-Match serve very different purposes: If-None-Match is the modern version of If-Modified-Since, and can be used in GET requests to only load a resource if a (significant) update is available. If-Match however is used to protect PUT/POST/DELETE requests with check-and-set semantics: it means the server will nto perform the requested action on the resource if it is no longer exactly the resource that the client was aware of.
A client should never generate or modify an etag. The etag always comes from the etag header of an HTTP response from the server, and is sent back to that same server later.
An eTag cannot be converted between weak and strong. Whether it is weak or strong is inherent in the way the tag is computed: it either guarantees byte-by-byte equality, or it doesn't. If it doesn't, it is interpreted to represent semantic equivalence in the intended usae context of the resources.
See (somewhat outdated) T303370 for the parsoid part of things.
HTTP considers different Content-Encodings of the same resources to be different entities which must have different ETags. See https://en.wikipedia.org/wiki/HTTP_ETag#ETag_generation and https://www.rfc-editor.org/rfc/rfc9110.html#name-example-entity-tags-varying.
HTTP servers and proxies often use Content-Encoding instead of the more appropriate Transfer-Encoding because Content-Encoding is better supported by clients. See discussion at https://stackapps.com/a/3655 and <https://en.wikipedia.org/wiki/HTTP_compression and https://stackoverflow.com/questions/11641923/transfer-encoding-gzip-vs-content-encoding-gzip>.
In theory, proxies that apply a Content-Encoding (not Transfer-Encoding) on the fly must assign a new ETag, and must return status 214 ("Warning: transformation applied"). The latter appears to be uncommon though.

Data

Response headers received from https://en.wikipedia.org/api/rest_v1/page/html/User:Arlolra%2Fsandbox (via RESTbase and Varnish, with irrelevant stuff removed):

HTTP/2 200 OK
content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.4.0"
cache-control: s-maxage=1209600, max-age=0, must-revalidate
access-control-expose-headers: etag
content-location: https://en.wikipedia.org/api/rest_v1/page/html/User%3AArlolra%2Fsandbox
server: restbase1027
date: Fri, 17 Jun 2022 09:32:36 GMT
etag: W/"1080007918/fed67090-af9f-11ec-a9b7-25dd8b9c1c83"
content-encoding: gzip
vary: Accept, Accept-Encoding
age: 15
x-cache: cp3058 miss, cp3058 hit/2
x-cache-status: hit-front
server-timing: cache;desc="hit-front", host;desc="cp3058"
accept-ranges: bytes
content-length: 721

Note the weak ETag and content-encoding: gzip.

Details

Subject	Repo	Branch	Lines +/-
Bump parsoid to 0.16.0-a20	mediawiki/vendor	master	+413 -145
TransformHandler: do not ignore ETags.	mediawiki/services/parsoid	master	+11 -12
ParsoidHandler: do not emit etag for wt2html	mediawiki/core	master	+3 -5
Add tests for if-match header handling.	mediawiki/services/parsoid	master	+33 -0
Require the page endpoint to return strong Etags	mediawiki/services/parsoid	master	+2 -2
REST: Allow weak ETags to match strong ETags.	mediawiki/core	master	+59 -27
Parsoid API: emit strong eTags	mediawiki/core	master	+7 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T324931 Clean up open RESTBase related tickets
In Progress	None	T262315 <CORE TECHNOLOGY> API Migration & RESTBase Sunset
Open	None	T264669 Move VE API from RESTBase into core.
Resolved	daniel	T310710 Do not ignore HTTP conditionals in transform REST endpoint.
Resolved	None	T310904 Find out if Varnish is messing with ETags, and what to do about it.

Event Timeline

daniel created this task.Jun 15 2022, 2:16 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 15 2022, 2:16 PM

daniel updated the task description. (Show Details)Jun 15 2022, 2:17 PM

daniel removed DAlangi_WMF as the assignee of this task.Jun 15 2022, 2:33 PM

daniel added a subscriber: DAlangi_WMF.

daniel updated the task description. (Show Details)Jun 15 2022, 2:38 PM

daniel updated the task description. (Show Details)Jun 15 2022, 2:52 PM

daniel added a comment.Jun 15 2022, 3:31 PM

This comment was removed by daniel.

daniel renamed this task from Visual Editor should send strong eTags in If-Match headers to Do not ignore HTTP conditionals in transform REST endpoint..Jun 15 2022, 5:47 PM

daniel updated the task description. (Show Details)

Change 805886 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Parsoid API: emit strong eTags

https://gerrit.wikimedia.org/r/805886

gerritbot added a project: Patch-For-Review.Jun 15 2022, 6:49 PM

daniel moved this task from Unsorted pile to Doing on the Platform Team Workboards (MW Expedition) board.Jun 15 2022, 6:56 PM

The issue originates in I5edfa8aa54dedaec418579bbef37aa040170d09b, which sais: ETag harmonisation Add custom weak ETag to page requests in the format used by RESTBase.

That patch was written to address T235462 and T235460, but neither of them asked for a weak eTag. Making it weak seems to have been a mistake, which caused T238849, which in turn led to checkConditionals() to be overwritten as a work-around.

ppelberg moved this task from To Triage to Triaged on the VisualEditor board.Jun 16 2022, 3:28 PM

On Slack, @Arlolra pointed out that it may be Varnish adding the W/ prefix to "weaken" the etag. Per the varnish docs, this might indeed be the case: https://varnish-cache.org/docs/6.0/users-guide/compression.html

For discussion of this fact elsewhere, see https://github.com/wikimedia/restbase/pull/1163#issue-465287958 and https://github.com/wikimedia/restbase/pull/968#issuecomment-373152546

So let me see if I'm getting this right: VE gets HTML via Varnish and RESTbase. It's getting an ETag generated by RESTbase, and "weakened" by Varnish.

It uses this ETag in an If-Match header sent to the transform endpoint, which would always fail because If-Match never works for weak ETags. To work around this, we disable processing of HTTP conditionals for the transform endpoint (which is against the HTTP spec).

This leads me to the following questions:

Do I understand correctly that the ETag generated by ParsoidHandler never makes it to VE when it's looped through RESTbase? That is, fixing the ETag generated by ParsoidHandler will help for third parties, but not for WMF?
Does Varnish "weaken" the ETag in order to apply Content-Encoding, or for some other reason? Can this be changed?
If we are ignoring the If-Match header anyway, can we just stop VE from sending it? The we wouldn't need the server side workaround. VE could be smart and still send If-Match if it has a strong ETag, and know not to send it if the ETag is weak.

daniel updated the task description. (Show Details)Jun 17 2022, 9:32 AM

daniel updated the task description. (Show Details)Jun 17 2022, 9:41 AM

daniel updated the task description. (Show Details)Jun 17 2022, 9:45 AM

Change 806413 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Require the page endpoint to return strong Etags

https://gerrit.wikimedia.org/r/806413

daniel mentioned this in T310464: Make use of stashed data-parsoid mapping for html-to-wikitext transformation.Jun 17 2022, 1:37 PM

daniel updated the task description. (Show Details)Jun 17 2022, 1:43 PM

daniel mentioned this in T128525: Strip old metadata from old Parsoid content <head>: mw:TimeUuid, user, comment.Jun 17 2022, 1:48 PM

daniel mentioned this in T310904: Find out if Varnish is messing with ETags, and what to do about it..Jun 17 2022, 4:20 PM

Change 805886 merged by jenkins-bot:

[mediawiki/core@master] Parsoid API: emit strong eTags

https://gerrit.wikimedia.org/r/805886

ReleaseTaggerBot added a project: MW-1.39-notes (1.39.0-wmf.17; 2022-06-20).Jun 17 2022, 7:00 PM

daniel claimed this task.Jun 22 2022, 8:41 AM

Change 807539 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

Change 807601 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Add tests for if-match header handling.

https://gerrit.wikimedia.org/r/807601

Change 807539 merged by jenkins-bot:

[mediawiki/core@master] REST: Allow weak ETags to match strong ETags.

https://gerrit.wikimedia.org/r/807539

Further, we could have a "quirk" setting in the REST framework that allows weak etags to match in an If-Match header for specific endpoints.

Going with this solution for now. It's against the spec, but should be safe anyway: the backend knows that the etag is "strong", because it generated it in the first place. So it can ignore the fact that a "weak" marker got added along the way.

daniel updated the task description. (Show Details)Jun 28 2022, 10:48 AM

daniel updated the task description. (Show Details)Jun 28 2022, 10:59 AM

Change 806413 abandoned by Daniel Kinzler:

[mediawiki/services/parsoid@master] Require the page endpoint to return strong Etags

Reason:

Per Arlo

https://gerrit.wikimedia.org/r/806413

daniel closed subtask T310904: Find out if Varnish is messing with ETags, and what to do about it. as Resolved.Jul 15 2022, 12:43 PM

Change 815250 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] TransformHandler: do not ignore ETags.

https://gerrit.wikimedia.org/r/815250

Change 807601 abandoned by Daniel Kinzler: