ParsoidHandler currently overrides checkPreconditions() to disable HTTP conditionals such as If-Match and If-Non-Match. This method should be deleted to restore the default behavior.
However, we cannot currently start using If-Match headers, since Visual Editor presently sends "weak" eTags, which per the HTTP spec will never match If-Match (they are for use with If-None-Match only). See @ssastry on T238849#5683035.
The reason is that VE is receiving a weak eTag when it loads the page's HTTP. The weak etags originate in two places:
Firstly, Varnish (apparently?!) "weakens" the strong ETag it receives from RESTbase (presumably when applying on-the-fly compression). Secondly, generated by the ParsoidHandler base class, which was implemented to emulate the ETags observed when loading HTML from RESTbase (via Varnish). Both are incorrect: if we want to be able to use the eTag to protect a modifying operation such as an edit, it must be a strong etag. Even for reading, we want the client to get fresh data when the rendering of the page changes in any way.
Approach
Since the issue originates in two places, it needs to be addressed in both:
Firstly, the ParsoidHandler base class should emit strong ETags. This is easily done, but it will not help in WMF production, when responses are looped through RESTbase and Varnish: RESTbase ignores the ETag it receives from the parsoid extension endpoint and generates a new one. Varnish then weakens that tag on the way out.
Secondly, Varnish shouldn'e weaken ETags, for any reason. Ideally, we would use Tranfer-Encoding for on-the-fly compression, which would preserve ETags. This may however not be possible to do, because Transfer-Encoding is not universally supported in clients.
A possible solution would be to change the Visual Editor client to not send the If-Match header, at least not if the ETag is weak. Since we are currently ignoring the If-Match header entirely, this would not make things any worse. And the client side code would be more clear about what semantics it can expect from the server. This doesn't work, because in WMF production, sending If-Match headers with a weak hash does work, since RESTbase allows weak tags to match (contrary to the HTTP spec). We rely on this behavior in production.
One way to work around this would be to have VE supply the etag as part of the payload, rather than using the If-Match header. RESTbase supports embedded etags in a HTML-meta tag using property="mw:TimeUuid".
Further, we could have a "quirk" setting in the REST framework that allows weak etags to match in an If-Match header for specific endpoints.
Context
- If-Match and If-None-Match serve very different purposes: If-None-Match is the modern version of If-Modified-Since, and can be used in GET requests to only load a resource if a (significant) update is available. If-Match however is used to protect PUT/POST/DELETE requests with check-and-set semantics: it means the server will nto perform the requested action on the resource if it is no longer exactly the resource that the client was aware of.
- A client should never generate or modify an etag. The etag always comes from the etag header of an HTTP response from the server, and is sent back to that same server later.
- An eTag cannot be converted between weak and strong. Whether it is weak or strong is inherent in the way the tag is computed: it either guarantees byte-by-byte equality, or it doesn't. If it doesn't, it is interpreted to represent semantic equivalence in the intended usae context of the resources.
- See (somewhat outdated) T303370 for the parsoid part of things.
- HTTP considers different Content-Encodings of the same resources to be different entities which must have different ETags. See https://en.wikipedia.org/wiki/HTTP_ETag#ETag_generation and https://www.rfc-editor.org/rfc/rfc9110.html#name-example-entity-tags-varying.
- HTTP servers and proxies often use Content-Encoding instead of the more appropriate Transfer-Encoding because Content-Encoding is better supported by clients. See discussion at https://stackapps.com/a/3655 and <https://en.wikipedia.org/wiki/HTTP_compression and https://stackoverflow.com/questions/11641923/transfer-encoding-gzip-vs-content-encoding-gzip>.
- In theory, proxies that apply a Content-Encoding (not Transfer-Encoding) on the fly must assign a new ETag, and must return status 214 ("Warning: transformation applied"). The latter appears to be uncommon though.
Data
Response headers received from https://en.wikipedia.org/api/rest_v1/page/html/User:Arlolra%2Fsandbox (via RESTbase and Varnish, with irrelevant stuff removed):
HTTP/2 200 OK content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.4.0" cache-control: s-maxage=1209600, max-age=0, must-revalidate access-control-expose-headers: etag content-location: https://en.wikipedia.org/api/rest_v1/page/html/User%3AArlolra%2Fsandbox server: restbase1027 date: Fri, 17 Jun 2022 09:32:36 GMT etag: W/"1080007918/fed67090-af9f-11ec-a9b7-25dd8b9c1c83" content-encoding: gzip vary: Accept, Accept-Encoding age: 15 x-cache: cp3058 miss, cp3058 hit/2 x-cache-status: hit-front server-timing: cache;desc="hit-front", host;desc="cp3058" accept-ranges: bytes content-length: 721
Note the weak ETag and content-encoding: gzip.