Page MenuHomePhabricator

RESTBase k-r-v storage overcommit
Closed, InvalidPublic

Description

The revised k-r-v storage algorithm only evaluates renders for deletion when new renders are stored, and only those that were replaced by another TTL ago (or longer) are candidates for deletion. Likewise, revisions are only evaluated for deletion when a new revision is stored, and only if the corresponding renders were superseded TTL or more in the past. This means that in a perfect world, there will always be at least four renders stored (two revisions with two renders each), but in a significant number of cases, many more.

For example: Let's assume a TTL of 24 hours (86400 seconds). Imagine render 0 of revision A is stored for a new title.

revsionrendertimestamp
A02018-07-01T00:00:00

Subsequently render 1 of revision A is stored. Render 1 supersedes render 0 making render 0 a candidate for deletion TTL seconds from the time render 1 is written, but only after a new render is stored.

revsionrendertimestamp
A02018-07-01T00:00:00
A12018-07-03T00:00:00

Finally, revision 2 is stored, and if TTL seconds or more has elapsed between the writing of render 1, then render 0 can be deleted.

revsionrendertimestamp
A02018-07-01T00:00:00
A12018-07-03T00:00:00
A22018-07-05T00:00:00

Again, this is best-case scenario. Remember, the same is true of revisions, and when combined with sub-TTL edits and/or re-renders, the number of records persisted at any one time can be significant. Of course, over-stored records of this nature continue to be candidates for deletion, but only upon future writes (imagine a scenario where a flurry of edits for a title occurs within the span of TTL, followed by a period of relative quiet lasting weeks or months).

This is distinctly different from the "leakage" we experienced in T192689: Unchecked storage growth when Cassandra TTLs on the indices were set too low; This overstorage is a property of the system (even if undesirable). We understood this property to exist when working through the design, but what we failed to comprehend at the time, is the difficulty in quantifying the adjusted utilization; The amount of overstore is a function of the storage workload (the distribution of edits, re-renders, and document size), which itself isn't well understood at this time.

What we do know at this time, is that on-disk utilization (including any savings from compression, etc) is at least 2x, and since utilization continues to grow linearly, we can assume that once quiescent, the multiplier will be something well above that.

Screenshot_2018-07-31 Grafana - Cassandra.png (721×1 px, 25 KB)
Last 90 days (eqiad)

It seems unlikely at this time that it will be worthwhile to equip the cluster with enough storage to accommodate this, so we should begin evaluating ways of bounding retention through alternate means.

Event Timeline

https://github.com/wikimedia/restbase/pull/1039 has been committed to issue a revision delete when a new render is stored. This trades some additional write amplification, in the form of redundantly issued deletes, in exchange for more aggressive culling.

Here is a recent histogram of ages for wikipedia_T_mobile__ng_remaining:

WeeksCount
016862224
17493038
25652134
34790434
44649020
53351847
61746823
72371460
81385656
91562058
101346352
11762808
121030605
13809189
14637257
15703937
16906143
171386464
18539699
19587454
20396045
21311160
22301049
23355252
24596923
25929272
26770811
27152369
28114598
29154434
30142607
31189614
3257555
3360892
3473269
3589850
3690977
3734092
3838136
3981419
40169210
4193017
4254875
437805
442351
45559
Pchelolo subscribed.

We don't use k-r-v storage anymore, so this task is invalid.