Page MenuHomePhabricator

Audit session storage to determine max age of un-GC'd sessions
Closed, ResolvedPublic

Description

Session data contains PII and is thus bound by Wikimedia's data retention guidelines. While sessions expire well before the max retention period (currently 90 days), it is not immediately removed from storage (Cassandra). Expired data is retained for a minimum period (default of 10 days), and then GC'd as compaction dictates. While it seems highly unlikely that actual retention will be anywhere near 90 days, the exact duration is difficult to reason about because it is a function of so many factors (throughput, cardinality, compaction concurrency, etc, etc); The easiest way to answer how long sessions are retained is to conduct an audit after session storage has been in use for a few weeks.

Event Timeline

@Eevans is this a task for you or were you looking for input from @EvanProdromou ?

@Eevans is this a task for you or were you looking for input from @EvanProdromou ?

@kchapman it's a task for me (to follow-up on some time after moving to production)

Anything left to do here?

Yes; Now that we've got the entire workload on the cluster, we should wait for a period of at least 10 days (though 30 days would be my recommendation), and then audit the dataset. How exactly we go about this audit is something that needs to be sussed out; The goal would be to establish confidence that we do not have tombstones hanging around that would violate our data retention guidelines.

I'd be happy to discuss methodologies for such a test.

Eevans removed Eevans as the assignee of this task.Jun 2 2021, 7:28 PM
Eevans claimed this task.

Turns out that we can do this very easily...

The oldest data file anywhere in the cluster is less than 3 weeks old (May 20).

sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2426118294 May 20 21:21 md-7286-big-Data.db
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2607709083 May 28 14:49 md-7423-big-Data.db
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2345777677 Jun  5 18:52 md-7570-big-Data.db
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  925995282 Jun  6 19:34 md-7589-big-Data.db
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  252239356 Jun  7 08:55 md-7598-big-Data.db
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  266557091 Jun  7 19:47 md-7607-big-Data.db
sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2529813589 May 23 00:35 md-7292-big-Data.db
sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2510615372 May 30 13:14 md-7425-big-Data.db
sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2626199983 Jun  6 19:45 md-7558-big-Data.db
sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  252315436 Jun  7 08:59 md-7567-big-Data.db
sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  266446615 Jun  7 19:51 md-7576-big-Data.db
sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2461788972 May 21 05:47 md-7243-big-Data.db
sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2718648334 May 29 10:16 md-7388-big-Data.db
sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2608444026 Jun  5 17:33 md-7521-big-Data.db
sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  721458175 Jun  7 14:06 md-7554-big-Data.db
sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra  183984959 Jun  7 20:03 md-7559-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2704190985 May 28 08:54 md-7512-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2584931013 Jun  4 18:57 md-7649-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  371948555 Jun  5 12:50 md-7662-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  610642343 Jun  7 02:38 md-7691-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  263506142 Jun  7 14:48 md-7700-big-Data.db
sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  178313138 Jun  7 20:44 md-7705-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2808721571 May 24 17:56 md-7400-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2592080367 Jun  1 13:22 md-7541-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  681963679 Jun  3 09:35 md-7574-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  714526617 Jun  4 19:36 md-7603-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  751957644 Jun  6 15:48 md-7636-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  323764391 Jun  7 09:26 md-7649-big-Data.db
sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  261636385 Jun  7 20:01 md-7658-big-Data.db
sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2602471466 May 30 12:31 md-7547-big-Data.db
sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2579909321 Jun  6 16:08 md-7680-big-Data.db
sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  323592802 Jun  7 09:42 md-7693-big-Data.db
sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra  261309428 Jun  7 20:15 md-7702-big-Data.db

And the oldest tombstones in that file are all comfortably within gc_grace_seconds of the file age, the oldest only going back to May 13th.

root@sessionstore1001:/srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b# sstablemetadata md-7286-big-Data.db
WARN  20:57:00,893 Only 40.532GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
SSTable: /srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b/md-7286-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1620925061842965
Maximum timestamp: 1621545220357121
SSTable min local deletion time: 1620925061
SSTable max local deletion time: 1621548820
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.6815832577317515
TTL min: 0
TTL max: 3600
First token: -9223371826726254888 (key=enwiki:MWSession:l1c4s2hgvl2njj7if7ov1854r8ijs6cs)
Last token: 9223371556873319393 (key=enwiki:MWSession:peikm1mc1se3r4bcdvga2fha73plmpij)
minClustringValues: []
maxClustringValues: []
Estimated droppable tombstones: 1.6053814164518172
SSTable Level: 0
Repaired at: 0
Replay positions covered: {CommitLogPosition(segmentId=1619531657072, position=32250439)=CommitLogPosition(segmentId=1619531660544, position=572568)}
totalColumnsSet: 26330626
totalRows: 26330626
Estimated tombstone drop times:
1620929291:    904155
1620936070:    909365
1620942632:    758505
1620948871:    549490
1620954573:    432984
1620959823:    396278
1620964841:    386872
1620969786:    411047
1620974784:    484954
1620980254:    563213
1620986134:    648321
1620992347:    702498
1620998460:    718215
1621004174:    707948
1621009759:    714103
1621015492:    721563
1621021346:    688849
1621027083:    631726
1621032848:    564995
1621038943:    501582
1621045437:    470515
1621052153:    488740
1621058895:    536796
1621065610:    630684
1621072078:    635172
1621078000:    576237
1621083234:    561100
1621088241:    592402
1621093459:    663770
1621099039:    692286
1621104999:    702768
1621111080:    701176
1621117266:    665579
1621123698:    532952
1621130358:    467978
1621136824:    482888
1621143296:    505159
1621149785:    596563
1621156222:    683204
1621162769:    734054
1621169446:    815423
1621176222:    871911
1621183095:    898948
1621190140:    912248
1621197140:    900624
1621204243:    787960
1621211979:    672250
1621219988:    628283
1621227352:    625878
1621233623:    567889
1621238836:    541150
1621243533:    529295
1621248071:    520081
1621252632:    576252
1621257134:    637244
1621261756:    656719
1621266490:    685527
1621271387:    710028
1621276393:    714394
1621281449:    703684
1621286467:    640982
1621291584:    553620
1621296950:    487708
1621302437:    445668
1621308041:    446942
1621313793:    489114
1621319585:    581874
1621325331:    654963
1621331102:    702606
1621337216:    722363
1621343305:    794749
1621349395:    843293
1621355469:    850024
1621361511:    881960
1621367255:    873607
1621373450:    741853
1621379764:    607569
1621386486:    543095
1621393161:    567846
1621399885:    599749
1621406867:    693697
1621413648:    766170
1621420529:    806240
1621427559:    886080
1621434449:    962104
1621441401:    962815
1621448376:    931643
1621455299:    891335
1621462304:    805412
1621469687:    676876
1621477423:    695665
1621485206:    735166
1621493284:    871547
1621501318:    961014
1621509473:   1193683
1621517320:   1243719
1621525295:   1287895
1621532829:   1223501
1621539553:   1067747
1621546454:    861366
Count               Row Size        Cell Count
1                          0          43494794
2                          0                 0
3                          0                 0
4                          0                 0
5                          0                 0
6                          0                 0
7                          0                 0
8                          0                 0
10                         0                 0
12                         0                 0
14                         0                 0
17                         0                 0
20                         0                 0
24                         0                 0
29                         0                 0
35                         0                 0
42                         0                 0
50                         0                 0
60                         0                 0
72                  17162077                 0
86                  18882565                 0
103                  7129824                 0
124                        0                 0
149                        0                 0
179                        0                 0
215                        0                 0
258                        0                 0
310                        0                 0
372                        0                 0
446                       14                 0
535                    90247                 0
642                    53948                 0
770                    13085                 0
924                    97806                 0
1109                   59931                 0
1331                    3907                 0
1597                     264                 0
1916                     318                 0
2299                     396                 0
2759                     266                 0
3311                      60                 0
3973                      72                 0
4768                       5                 0
5722                       0                 0
6866                       0                 0
8239                       0                 0
9887                       0                 0
11864                      1                 0
14237                      2                 0
17084                      0                 0
20501                      1                 0
24601                      0                 0
29521                      0                 0
35425                      0                 0
42510                      0                 0
51012                      0                 0
61214                      0                 0
73457                      0                 0
88148                      0                 0
105778                     0                 0
126934                     0                 0
152321                     3                 0
182785                     2                 0
219342                     0                 0
263210                     0                 0
315852                     0                 0
379022                     0                 0
454826                     0                 0
545791                     0                 0
654949                     0                 0
785939                     0                 0
943127                     0                 0
1131752                    0                 0
1358102                    0                 0
1629722                    0                 0
1955666                    0                 0
2346799                    0                 0
2816159                    0                 0
3379391                    0                 0
4055269                    0                 0
4866323                    0                 0
5839588                    0                 0
7007506                    0                 0
8409007                    0                 0
10090808                   0                 0
12108970                   0                 0
14530764                   0                 0
17436917                   0                 0
20924300                   0                 0
25109160                   0                 0
30130992                   0                 0
36157190                   0                 0
43388628                   0                 0
52066354                   0                 0
62479625                   0                 0
74975550                   0                 0
89970660                   0                 0
107964792                  0                 0
129557750                  0                 0
155469300                  0                 0
186563160                  0                 0
223875792                  0                 0
268650950                  0                 0
322381140                  0                 0
386857368                  0                 0
464228842                  0                 0
557074610                  0                 0
668489532                  0                 0
802187438                  0                 0
962624926                  0                 0
1155149911                 0                 0
1386179893                 0                 0
1663415872                 0                 0
1996099046                 0                 0
2395318855                 0                 0
2874382626                 0                  
3449259151                 0                  
4139110981                 0                  
4966933177                 0                  
5960319812                 0                  
7152383774                 0                  
8582860529                 0                  
10299432635                 0                  
12359319162                 0                  
14831182994                 0                  
17797419593                 0                  
21356903512                 0                  
25628284214                 0                  
30753941057                 0                  
36904729268                 0                  
44285675122                 0                  
53142810146                 0                  
63771372175                 0                  
76525646610                 0                  
91830775932                 0                  
110196931118                 0                  
132236317342                 0                  
158683580810                 0                  
190420296972                 0                  
228504356366                 0                  
274205227639                 0                  
329046273167                 0                  
394855527800                 0                  
473826633360                 0                  
568591960032                 0                  
682310352038                 0                  
818772422446                 0                  
982526906935                 0                  
1179032288322                 0                  
1414838745986                 0                  
Estimated cardinality: 43517593
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1620925061
EncodingStats minTimestamp: 1620925061842965
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes: []
StaticColumns: {}
RegularColumns: {value:org.apache.cassandra.db.marshal.BytesType}
root@sessionstore1001:/srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b#

So we are in fact below 30 days of retention (and well below the 90 days we are "allowed").