The revised k-r-v storage algorithm only evaluates renders for deletion when new renders are stored, and only those that were replaced by another TTL ago (or longer) are candidates for deletion. Likewise, revisions are only evaluated for deletion when a new revision is stored, and only if the corresponding renders were superseded TTL or more in the past. This means that in a perfect world, there will always be at least four renders stored (two revisions with two renders each), but in a significant number of cases, many more.
For example: Let's assume a TTL of 24 hours (86400 seconds). Imagine render 0 of revision A is stored for a new title.
revsion | render | timestamp |
---|---|---|
A | 0 | 2018-07-01T00:00:00 |
Subsequently render 1 of revision A is stored. Render 1 supersedes render 0 making render 0 a candidate for deletion TTL seconds from the time render 1 is written, but only after a new render is stored.
revsion | render | timestamp |
---|---|---|
A | 0 | 2018-07-01T00:00:00 |
A | 1 | 2018-07-03T00:00:00 |
Finally, revision 2 is stored, and if TTL seconds or more has elapsed between the writing of render 1, then render 0 can be deleted.
revsion | render | timestamp |
---|---|---|
A | 1 | 2018-07-03T00:00:00 |
A | 2 | 2018-07-05T00:00:00 |
Again, this is best-case scenario. Remember, the same is true of revisions, and when combined with sub-TTL edits and/or re-renders, the number of records persisted at any one time can be significant. Of course, over-stored records of this nature continue to be candidates for deletion, but only upon future writes (imagine a scenario where a flurry of edits for a title occurs within the span of TTL, followed by a period of relative quiet lasting weeks or months).
This is distinctly different from the "leakage" we experienced in T192689: Unchecked storage growth when Cassandra TTLs on the indices were set too low; This overstorage is a property of the system (even if undesirable). We understood this property to exist when working through the design, but what we failed to comprehend at the time, is the difficulty in quantifying the adjusted utilization; The amount of overstore is a function of the storage workload (the distribution of edits, re-renders, and document size), which itself isn't well understood at this time.
What we do know at this time, is that on-disk utilization (including any savings from compression, etc) is at least 2x, and since utilization continues to grow linearly, we can assume that once quiescent, the multiplier will be something well above that.
Last 90 days (eqiad) |
It seems unlikely at this time that it will be worthwhile to equip the cluster with enough storage to accommodate this, so we should begin evaluating ways of bounding retention through alternate means.