There is something very weird going on. It seems that sometimes (read: under unknown circumstances) RESTBase does not store HTML and data-parsoid in the stash bucket when stash=true is passed to it. This has first been noticed in T233127 where VE receives a 404 from RESTBase when POSTing to transform/html/to/wikitext. This happens around 150 times per day.
We added logging in both VE and RB to clarify the situation, but this has only brought more question marks. After debugging this for a while, I'm now convinced it's a problem somewhere in RESTBase and/or Cassandra.
Here are some things that are confirmed not to cause this issue:
- VE always asks content with stash=true
- VE sends the appropriate If-Match header when POSTing
- If RB responds with a 404 for html2wt the contents is not present in Cassandra under title:rev:tid in the stash bucket
- When RB gets the HTML, for some reason it returns the ETag of the full response instead of the one under which it stores it in the stash bucket, but, if the logs are to be trusted, these ETags never differ.
Here's an illustrative example for TID [96ca91a0-e9d3-11e9-b260-611ef1df68a5](https://logstash.wikimedia.org/goto/20c95dd0d3b062cdbb5fca06b2b33dfa):
- VE requests Draft:Sandbox to which RB replies with a 200 with the appropriate headers
- 8 seconds later VE POSTs to transform the HTML, but RB responds with a 404 because it can't find the stashed content
While I was trying to reproduce the error myself (without success), I noticed a very weird thing. I edited my user sandbox twice in a row with VE and successfully saved both revisions and then started a third edit. However, when I went to look into Cassandra, the first stash was not there any more:
cassandra@cqlsh:enwiki_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU> select key, tid, ttl(value) from data where "_domain" = 'en.wikipedia.org' and key = 'User:Mobrovac-WMF/sandbox:920061681:d43dc630-e904-11e9-8729-8b9cbd78a8be'; key | tid | ttl(value) -----+-----+------------ (0 rows) cassandra@cqlsh:enwiki_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU> select key, tid, ttl(value) from data where "_domain" = 'en.wikipedia.org' and key = 'User:Mobrovac-WMF/sandbox:920237028:36a81d00-e9cf-11e9-ade6-3b8b9ff9d123'; key | tid | ttl(value) --------------------------------------------------------------------------+--------------------------------------+------------ User:Mobrovac-WMF/sandbox:920237028:36a81d00-e9cf-11e9-ade6-3b8b9ff9d123 | 380eb1e0-e9cf-11e9-a24d-d79af64c7434 | 84190 (1 rows)