Page MenuHomePhabricator

Stashing: revid mismatch between URI and Etag
Closed, ResolvedPublic

Description

In certain cases (yet to be determined), stashing transform requests has a mismatch in the revid between the ETag and URI parameter. In all of these cases, the request is sent to /transform/html/to/wikitext/{title}/{revision}, but the ETag is 0/{timeuuid}/stash. This suggests that in a previous call RESTBase set 0 for the revision id when it should have set it to the appropriate value, since these requests are not for new pages.

An example request is for /es.wikipedia.org/v1/transform/html/to/wikitext/Mar_de_amor/120200307 sent with the If-Match: W/"0/d14208f0-eec3-11e9-9a9b-fd292f60a221/stash" header. RESTBase responds with a 404 because it can't find the stashed content with revid 120200307:

cassandra@cqlsh:wikipedia_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU> select key,headers,tid from data where "_domain"='es.wikipedia.org' and key='Mar_de_amor:120200307:d14208f0-eec3-11e9-9a9b-fd292f60a221';

 key | headers | tid
-----+---------+-----

(0 rows)

Alas, the content is there, just with revid 0:

cassandra@cqlsh:wikipedia_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU> select key,headers,tid from data where "_domain"='es.wikipedia.org' and key='Mar_de_amor:0:d14208f0-eec3-11e9-9a9b-fd292f60a221';

 key                                                | headers                                                                                       | tid
----------------------------------------------------+-----------------------------------------------------------------------------------------------+--------------------------------------
 Mar_de_amor:0:d14208f0-eec3-11e9-9a9b-fd292f60a221 | {"etag":"\"0/d14208f0-eec3-11e9-9a9b-fd292f60a221/stash\"","content-type":"application/json"} | d283fed0-eec3-11e9-9a9b-fd292f60a221

(1 rows)

Event Timeline

mobrovac created this task.

Yup, confirmed the problem is RESTBase not honouring its own ETag. I did a thorough investigation and the problem arises when clients call /transform/wikitext/to/html/{title} followed by /transform/html/to/wikitext/{title}/{revision}. Instead of trusting the ETag, RESTBase uses the {revision} parameter which may or may not be correct. Because the chain tranform wt2html -> html2wt should be idempotent, RESTBase should use the ETag rather than the provided revision, which should be used only as a fall-back mechanism.

Mentioned in SAL (#wikimedia-operations) [2019-10-16T03:35:37Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465

Mentioned in SAL (#wikimedia-operations) [2019-10-16T03:49:14Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465 (duration: 13m 37s)