On July 01 approximately at 12:55AM UTC several Cassandra nodes died because of the usual tombstones issue, however the root cause of this was very unusual.
First, a rerender of the https://ru.wikipedia.org/wiki/Портал%3AГерпетология page came, probably because of the a transfusion update. This page uses flagged revisions, so according to Mediawiki the latest revision of the page is 68650450 while RESTBase also has revision 85886331 in storage.
After this rerenader, hundreds of events like this were emitted by RESTBase
{ "meta": { "domain": "ru.wikipedia.org", "dt": "2017-07-01T09:45:26.813Z", "id": "02ee94c5-5e42-11e7-8769-b51cc5d64d49", "request_id": "01b614bc-5e42-11e7-8cc8-835a59932741", "schema_uri": "resource_change/1", "topic": "resource_change", "uri": "http://ru.wikipedia.org/api/rest_v1/page/html/%D0%9F%D0%BE%D1%80%D1%82%D0%B0%D0%BB%3A%D0%93%D0%B5%D1%80%D0%BF%D0%B5%D1%82%D0%BE%D0%BB%D0%BE%D0%B3%D0%B8%D1%8F/68650450" }, "tags": [ "restbase" ] } { "meta": { "domain": "ru.wikipedia.org", "dt": "2017-07-01T09:45:26.827Z", "id": "02f0a545-5e42-11e7-b23d-1eb168871649", "request_id": "01b936b1-5e42-11e7-9acf-d81c366b200a", "schema_uri": "resource_change/1", "topic": "resource_change", "uri": "http://ru.wikipedia.org/api/rest_v1/page/html/%D0%9F%D0%BE%D1%80%D1%82%D0%B0%D0%BB%3A%D0%93%D0%B5%D1%80%D0%BF%D0%B5%D1%82%D0%BE%D0%BB%D0%BE%D0%B3%D0%B8%D1%8F" }, "tags": [ "restbase" ] }
These events should only be emitted in case a new render of the revision, different from the previously stored render was saved. Although Cassandra storage has about a thousand renders of that revision, their TIDs do not align with the incident timing, and they all seem like a legitimate result of the template rerender (why they were not removed by a revision retention policy is a big separate question on it's own).
All of these events were picked up by ChangeProp and triggered mobile content updates, which in turn came back to RESTBase, probably somehow triggering even more html-change events to be emitted. Eventually Cassandra nodes responsible for this partition died.
After almost a day of investigating this I still didn't identify the root cause, but the issue is pretty serious since if the condition that caused it happens again it can bring down the whole RESTBase cluster.