Page MenuHomePhabricator

High RESTBase storage utilization
Closed, ResolvedPublic

Description

Cassandra storage utilization for RESTBase is getting quite high; At the time of this writing, the cluster-wide average is ~60%.

Past efforts to cull revisions were meant to delete everything older than December 2015, yet there is evidence that a not insignificant number of records still exist. Since it should be uncontroversial to delete everything older than December 2015, I propose we treat this as the lowest hanging fruit and begin there. And, as it will take some time for tombstones to work their way through the compaction pipeline, I propose we start this as soon as possible.

Event Timeline

Eevans created this task.Jul 11 2016, 8:19 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 11 2016, 8:19 PM
Eevans triaged this task as High priority.Jul 11 2016, 8:19 PM
Eevans moved this task from Backlog to In-Progress on the Cassandra board.Jul 12 2016, 8:17 PM

Mentioned in SAL [2016-07-12T20:18:27Z] <urandom> Start revision culling script for local_group_wikipedia_T_parsoid_html, from restbase1009.eqiad.wmnet : T140008

Eevans added a comment.Aug 2 2016, 4:35 PM

Current status:

JobScriptStarted
local_group_wikipedia_T_parsoid_htmlthin_out_key_rev_value_data.js2016-07-12T20:13:05+0000
local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJthin_out_key_rev_value_data.js2016-07-22T23:21:26+0000
local_group_wikipedia_T_parsoid_section_offsetsthin_out_key_rev_value_data.js2016-08-01T16:18:55+0000
local_group_wikimedia_T_parsoid_{html, dataW4ULtxs1oMqJ, section_offsets}thin_out_parsoid.js2016-07-25T18:08:07+0000
Eevans added a comment.Aug 3 2016, 3:12 PM

Current status:

JobScriptStarted
local_group_wikipedia_T_parsoid_htmlthin_out_key_rev_value_data.js2016-07-12T20:13:05+0000
local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJthin_out_key_rev_value_data.js2016-07-22T23:21:26+0000
local_group_wikipedia_T_parsoid_section_offsetsthin_out_key_rev_value_data.js2016-08-01T16:18:55+0000
local_group_wikimedia_T_parsoid_{html, dataW4ULtxs1oMqJ, section_offsets}thin_out_parsoid.js2016-07-25T18:08:07+0000
Eevans added a comment.Aug 3 2016, 9:56 PM

Something interesting of note: The droppable tombstones ratio across the cluster is very high. Here are the values for the Wikipedia tables we've been culling recently:

1restbase1007.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8510595071552199
2restbase1007.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.830688494731746
3restbase1007.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8323713085090594
4restbase1007.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8634687791121277
5restbase1007.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.859455520697765
6restbase1007.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8591528755005924
7restbase1007.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8304493840848477
8restbase1007.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8399081298659838
9restbase1007.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8324697268408491
10restbase1010.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8207528208687649
11restbase1010.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8308683539510479
12restbase1010.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8434910867326589
13restbase1010.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8414586603010478
14restbase1010.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8525526179476033
15restbase1010.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8525587027287187
16restbase1010.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8253027029539672
17restbase1010.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8311300234211876
18restbase1010.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8348782146840138
19restbase1011.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.823063762966097
20restbase1011.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8484828309837151
21restbase1011.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8226767404756636
22restbase1011.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.847507581062547
23restbase1011.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8607570420616993
24restbase1011.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8531914183886888
25restbase1011.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8266245349695364
26restbase1011.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8371372144094608
27restbase1011.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8334346620740313
28restbase1008.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.848703447491683
29restbase1008.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8604303773412872
30restbase1008.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8582590424278107
31restbase1008.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8630787905399421
32restbase1008.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8477858315245481
33restbase1008.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8581802701623951
34restbase1008.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8273120466011535
35restbase1008.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8299582201954786
36restbase1008.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8314556639278731
37restbase1012.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8510597248617231
38restbase1012.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.854734052683109
39restbase1012.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8588386361414955
40restbase1012.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8671140384469482
41restbase1012.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8709861833971166
42restbase1012.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8618488215549217
43restbase1012.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.845132365699308
44restbase1012.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8267040304198009
45restbase1012.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8323929993557118
46restbase1013.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.843822690102959
47restbase1013.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8394919291017904
48restbase1013.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8651812346315976
49restbase1013.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8555488691112866
50restbase1013.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8643430895450983
51restbase1013.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8484497227526021
52restbase1013.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8209255475413948
53restbase1013.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.836521469215144
54restbase1013.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8381996815161774
55restbase1009.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.845589863742052
56restbase1009.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8555763308567401
57restbase1009.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8392833263490613
58restbase1009.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8695618838880024
59restbase1009.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8898060224084281
60restbase1009.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8569221699192938
61restbase1009.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8432138127368003
62restbase1009.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8464512380389568
63restbase1009.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8416672589550867
64restbase1014.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8379379661599992
65restbase1014.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8424116550052999
66restbase1014.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8431137030439143
67restbase1014.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8640524724169203
68restbase1014.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8857791182501861
69restbase1014.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8821456473109804
70restbase1014.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8489632573173771
71restbase1014.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8363279908711002
72restbase1014.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8365843338067919
73restbase1015.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.830663222934134
74restbase1015.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8333206738004157
75restbase1015.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8291757421905597
76restbase1015.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8704337399947567
77restbase1015.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8728194106586296
78restbase1015.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8624769287517878
79restbase1015.eqiad.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8379053729064754
80restbase1015.eqiad.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.831474775378054
81restbase1015.eqiad.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8461562501983481
82restbase2003.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8566400279482441
83restbase2003.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8643549509501731
84restbase2003.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8580077643211967
85restbase2003.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.848179604431367
86restbase2003.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.84651942248862
87restbase2003.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8634998623233892
88restbase2003.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8427077717126733
89restbase2003.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8386710936755165
90restbase2003.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8333688084486561
91restbase2004.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8482675150191257
92restbase2004.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8562195733226167
93restbase2004.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8545458190205087
94restbase2004.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8577184905859835
95restbase2004.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8491775634204538
96restbase2004.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8562044833509123
97restbase2004.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8348174028739376
98restbase2004.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.847125162934577
99restbase2004.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8306382685257819
100restbase2008.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8622595592210536
101restbase2008.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8602014360122091
102restbase2008.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8574668947459001
103restbase2008.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8480357050928972
104restbase2008.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8543512477231281
105restbase2008.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8383449650612326
106restbase2008.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8379045702839132
107restbase2008.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8370956155801388
108restbase2008.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8319544376651181
109restbase2001.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8449160348562379
110restbase2001.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8643755005602403
111restbase2001.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8547360837948231
112restbase2001.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8625175560248685
113restbase2001.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8495571412056702
114restbase2001.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8462254526141322
115restbase2001.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8446781708122834
116restbase2001.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8290920031341087
117restbase2001.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8334695911373671
118restbase2002.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8549267109852547
119restbase2002.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8500925333133053
120restbase2002.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.85221526107431
121restbase2002.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8499125538436272
122restbase2002.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.846708685605267
123restbase2002.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8524183438588635
124restbase2002.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8286336222657182
125restbase2002.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8349823097204827
126restbase2002.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8398608721235203
127restbase2007.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8418367883674756
128restbase2007.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8605183365479759
129restbase2007.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8660630977782993
130restbase2007.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8356909925745338
131restbase2007.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8486790672354004
132restbase2007.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.862090449936858
133restbase2007.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8404849191724056
134restbase2007.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8212643052646388
135restbase2007.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8359222808106171
136restbase2005.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.8522336274967827
137restbase2005.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8503869654559936
138restbase2005.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8582165798939602
139restbase2005.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8341762071848025
140restbase2005.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8599327144901775
141restbase2005.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8533380186235563
142restbase2005.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8286965758910636
143restbase2005.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8373252097698859
144restbase2005.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8370106279356303
145restbase2006.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.850333710699277
146restbase2006.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8438259604311418
147restbase2006.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.845969884388787
148restbase2006.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8485288403499031
149restbase2006.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8614403000416461
150restbase2006.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.847283044258451
151restbase2006.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8353420827423197
152restbase2006.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8394468725945651
153restbase2006.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8408105390798652
154restbase2009.codfw.wmnet: a: local_group_wikipedia_T_parsoid_html: 0.833135415444506
155restbase2009.codfw.wmnet: b: local_group_wikipedia_T_parsoid_html: 0.8408995343102604
156restbase2009.codfw.wmnet: c: local_group_wikipedia_T_parsoid_html: 0.8585637326031916
157restbase2009.codfw.wmnet: a: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8223770665077595
158restbase2009.codfw.wmnet: b: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8473241145779351
159restbase2009.codfw.wmnet: c: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ: 0.8375401955489872
160restbase2009.codfw.wmnet: a: local_group_wikipedia_T_parsoid_section_offsets: 0.8244399637480944
161restbase2009.codfw.wmnet: b: local_group_wikipedia_T_parsoid_section_offsets: 0.8279509781980064
162restbase2009.codfw.wmnet: c: local_group_wikipedia_T_parsoid_section_offsets: 0.8219795105413132

I'd be inclined to believe that this is all this is (recent efforts to cull revisions), but the ratios seem to be equally high for both tables that predate the start of culling, and those which are are much newer (as new as today, and tables which are the product of numerous recent compactions).

Additionally, the handful of other tables I've sampled (local_group_wikipedia_T_mobileapps_lead, local_group_wikipedia_T_mobileapps_remaining, local_group_wikivoyage_T_parsoid_html, and local_group_wiktionary_T_parsoid_html for example), are all over .75.

I suspect that the use of revision retention policies, and row fragments that are distributed across many tables (the so-called overlapping tables problem), are preventing tombstone GC during compaction.

This investigation continues; More to come...

Mentioned in SAL [2016-08-08T16:53:25Z] <urandom> T140008: Starting major compaction, restbase2001-a.codfw.wmnet

Eevans added a comment.Aug 9 2016, 9:50 PM

As an experiment, I selected 9 files "at random" from a parsoid html directory.

-rw-r--r-- 1 cassandra cassandra  17G Jul 20 07:14 la-144417-big-Data.db
-rw-r--r-- 1 cassandra cassandra 137M Jul 20 04:43 la-144419-big-Data.db
-rw-r--r-- 1 cassandra cassandra 364M Jul 20 09:21 la-144479-big-Data.db
-rw-r--r-- 1 cassandra cassandra  61G Jul 26 21:44 la-146542-big-Data.db
-rw-r--r-- 1 cassandra cassandra  51M Jul 28 17:50 la-147332-big-Data.db
-rw-r--r-- 1 cassandra cassandra  55M Jul 29 06:21 la-147499-big-Data.db
-rw-r--r-- 1 cassandra cassandra  27M Jul 31 07:00 la-148186-big-Data.db
-rw-r--r-- 1 cassandra cassandra 5.0G Jul 31 20:05 la-148280-big-Data.db
-rw-r--r-- 1 cassandra cassandra  20G Jul 31 22:15 la-148319-big-Data.db

These files totaled ~104G.

la-144417-big-Data.db 0.88
la-144419-big-Data.db 0.80
la-144479-big-Data.db 0.81
la-146542-big-Data.db 0.73
la-147332-big-Data.db 0.85
la-147499-big-Data.db 0.80
la-148186-big-Data.db 1.00
la-148280-big-Data.db 0.80
la-148319-big-Data.db 0.87

They had an average droppable tombstone ratio of ~0.84.

After a user-defined compaction of all 9 files...

-rw-r--r-- 1 cassandra cassandra 91G Aug  6 02:12 la-150011-big-Data.db

...reduction of ~13% in file size, and a final droppable ratio of 0.787.

Eevans added a comment.EditedAug 10 2016, 5:01 PM

As another experiment, I performed a major compaction on a parsoid html data table.

File sizes before:

-rw-r--r-- 1 cassandra cassandra  20583501477 Jul  9 07:08 la-140530-big-Data.db
-rw-r--r-- 1 cassandra cassandra   3313474579 Jul  9 02:02 la-140532-big-Data.db
-rw-r--r-- 1 cassandra cassandra  62330495337 Jul  9 14:59 la-140536-big-Data.db
-rw-r--r-- 1 cassandra cassandra   9725838494 Jul  9 04:56 la-140545-big-Data.db
-rw-r--r-- 1 cassandra cassandra 158702471414 Jul  9 20:59 la-140547-big-Data.db
-rw-r--r-- 1 cassandra cassandra   5317896815 Aug  3 13:22 la-149184-big-Data.db
-rw-r--r-- 1 cassandra cassandra  97006636137 Aug  6 02:12 la-150011-big-Data.db
-rw-r--r-- 1 cassandra cassandra   2860713574 Aug  6 06:30 la-150142-big-Data.db
-rw-r--r-- 1 cassandra cassandra   5702801567 Aug  6 06:50 la-150144-big-Data.db
-rw-r--r-- 1 cassandra cassandra   2872411873 Aug  6 23:28 la-150370-big-Data.db
-rw-r--r-- 1 cassandra cassandra   4696327712 Aug  7 14:20 la-150555-big-Data.db
-rw-r--r-- 1 cassandra cassandra   2614099299 Aug  7 16:30 la-150595-big-Data.db
-rw-r--r-- 1 cassandra cassandra    582735702 Aug  8 09:33 la-150799-big-Data.db
-rw-r--r-- 1 cassandra cassandra   2428077408 Aug  8 09:39 la-150800-big-Data.db
-rw-r--r-- 1 cassandra cassandra     18957826 Aug  8 11:49 la-150826-big-Data.db
-rw-r--r-- 1 cassandra cassandra    175203951 Aug  8 13:45 la-150851-big-Data.db
-rw-r--r-- 1 cassandra cassandra    531322835 Aug  8 13:46 la-150852-big-Data.db
-rw-r--r-- 1 cassandra cassandra    219471781 Aug  8 14:50 la-150866-big-Data.db
-rw-r--r-- 1 cassandra cassandra    216316109 Aug  8 15:52 la-150877-big-Data.db
-rw-r--r-- 1 cassandra cassandra    646960935 Aug  8 16:19 la-150887-big-Data.db
-rw-r--r-- 1 cassandra cassandra     53102472 Aug  8 16:32 la-150890-big-Data.db
-rw-r--r-- 1 cassandra cassandra     27256921 Aug  8 16:38 la-150891-big-Data.db
-rw-r--r-- 1 cassandra cassandra     26245634 Aug  8 16:47 la-150892-big-Data.db

Which comes out to about ~354G in total.

Droppable tombstone ratios before:

la-140530-big-Data.db 0.88
la-140532-big-Data.db 0.76
la-140536-big-Data.db 0.88
la-140545-big-Data.db 0.79
la-140547-big-Data.db 0.86
la-149184-big-Data.db 0.80
la-150011-big-Data.db 0.78
la-150142-big-Data.db 0.87
la-150144-big-Data.db 0.81
la-150370-big-Data.db 0.79
la-150555-big-Data.db 0.94
la-150595-big-Data.db 0.79
la-150799-big-Data.db 0.60
la-150800-big-Data.db 0.70
la-150826-big-Data.db 0.55
la-150851-big-Data.db 0.57
la-150852-big-Data.db 0.57
la-150866-big-Data.db 0.56
la-150877-big-Data.db 0.56
la-150887-big-Data.db 0.78
la-150890-big-Data.db 0.42
la-150891-big-Data.db 0.56
la-150892-big-Data.db 0.55

An average of ~0.72

After:

-rw-r--r-- 1 cassandra cassandra 214G Aug  9 06:31 la-150897-big-Data.db

A space savings of ~40% (with a final droppable tombstone ratio of ~0.11).

Mentioned in SAL [2016-08-15T19:27:21Z] <urandom> T140008: Staring user-defined compaction (10 tables, highest droppable tombstones), restbase2001-b.codfw.wmnet

Mentioned in SAL [2016-08-15T19:39:57Z] <urandom> T140008: Starting major compaction (WP parsoid html, split output) on restbase1007-a.eqiad.wmnet

All in-flight culling jobs have completed.

JobScriptStarted
local_group_wikipedia_T_parsoid_htmlthin_out_key_rev_value_data.js2016-07-12T20:13:05+0000
local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJthin_out_key_rev_value_data.js2016-07-22T23:21:26+0000
local_group_wikipedia_T_parsoid_section_offsetsthin_out_key_rev_value_data.js2016-08-01T16:18:55+0000
local_group_wikimedia_T_parsoid_{html, dataW4ULtxs1oMqJ, section_offsets}thin_out_parsoid.js2016-07-25T18:08:07+0000
Eevans lowered the priority of this task from High to Medium.Sep 12 2016, 6:03 PM
Eevans closed this task as Resolved.Nov 29 2016, 9:29 PM

This was completed some time ago; Closing