Session data contains PII and is thus bound by Wikimedia's data retention guidelines. While sessions expire well before the max retention period (currently 90 days), it is not immediately removed from storage (Cassandra). Expired data is retained for a minimum period (default of 10 days), and then GC'd as compaction dictates. While it seems highly unlikely that actual retention will be anywhere near 90 days, the exact duration is difficult to reason about because it is a function of so many factors (throughput, cardinality, compaction concurrency, etc, etc); The easiest way to answer how long sessions are retained is to conduct an audit after session storage has been in use for a few weeks.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aaron | T88445 MediaWiki active/active datacenter investigation and work (tracking) | |||
Resolved | Eevans | T206016 Create a service for session storage | |||
Resolved | Krinkle | T270223 FY2021-2022: Enable basic Multi-DC operations for read traffic (tracking) | |||
Resolved | Krinkle | T270225 Finish session storage to actually meet multi-DC requirements | |||
Resolved | Eevans | T222990 Audit session storage to determine max age of un-GC'd sessions |
Event Timeline
Yes; Now that we've got the entire workload on the cluster, we should wait for a period of at least 10 days (though 30 days would be my recommendation), and then audit the dataset. How exactly we go about this audit is something that needs to be sussed out; The goal would be to establish confidence that we do not have tombstones hanging around that would violate our data retention guidelines.
I'd be happy to discuss methodologies for such a test.
Turns out that we can do this very easily...
The oldest data file anywhere in the cluster is less than 3 weeks old (May 20).
sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2426118294 May 20 21:21 md-7286-big-Data.db sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2607709083 May 28 14:49 md-7423-big-Data.db sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2345777677 Jun 5 18:52 md-7570-big-Data.db sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 925995282 Jun 6 19:34 md-7589-big-Data.db sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 252239356 Jun 7 08:55 md-7598-big-Data.db sessionstore1001.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 266557091 Jun 7 19:47 md-7607-big-Data.db sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2529813589 May 23 00:35 md-7292-big-Data.db sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2510615372 May 30 13:14 md-7425-big-Data.db sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2626199983 Jun 6 19:45 md-7558-big-Data.db sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 252315436 Jun 7 08:59 md-7567-big-Data.db sessionstore1002.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 266446615 Jun 7 19:51 md-7576-big-Data.db sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2461788972 May 21 05:47 md-7243-big-Data.db sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2718648334 May 29 10:16 md-7388-big-Data.db sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 2608444026 Jun 5 17:33 md-7521-big-Data.db sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 721458175 Jun 7 14:06 md-7554-big-Data.db sessionstore1003.eqiad.wmnet: -rw-r--r-- 1 cassandra cassandra 183984959 Jun 7 20:03 md-7559-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2704190985 May 28 08:54 md-7512-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2584931013 Jun 4 18:57 md-7649-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 371948555 Jun 5 12:50 md-7662-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 610642343 Jun 7 02:38 md-7691-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 263506142 Jun 7 14:48 md-7700-big-Data.db sessionstore2001.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 178313138 Jun 7 20:44 md-7705-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2808721571 May 24 17:56 md-7400-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2592080367 Jun 1 13:22 md-7541-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 681963679 Jun 3 09:35 md-7574-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 714526617 Jun 4 19:36 md-7603-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 751957644 Jun 6 15:48 md-7636-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 323764391 Jun 7 09:26 md-7649-big-Data.db sessionstore2002.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 261636385 Jun 7 20:01 md-7658-big-Data.db sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2602471466 May 30 12:31 md-7547-big-Data.db sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 2579909321 Jun 6 16:08 md-7680-big-Data.db sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 323592802 Jun 7 09:42 md-7693-big-Data.db sessionstore2003.codfw.wmnet: -rw-r--r-- 1 cassandra cassandra 261309428 Jun 7 20:15 md-7702-big-Data.db
And the oldest tombstones in that file are all comfortably within gc_grace_seconds of the file age, the oldest only going back to May 13th.
root@sessionstore1001:/srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b# sstablemetadata md-7286-big-Data.db WARN 20:57:00,893 Only 40.532GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots SSTable: /srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b/md-7286-big Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.010000 Minimum timestamp: 1620925061842965 Maximum timestamp: 1621545220357121 SSTable min local deletion time: 1620925061 SSTable max local deletion time: 1621548820 Compressor: org.apache.cassandra.io.compress.LZ4Compressor Compression ratio: 0.6815832577317515 TTL min: 0 TTL max: 3600 First token: -9223371826726254888 (key=enwiki:MWSession:l1c4s2hgvl2njj7if7ov1854r8ijs6cs) Last token: 9223371556873319393 (key=enwiki:MWSession:peikm1mc1se3r4bcdvga2fha73plmpij) minClustringValues: [] maxClustringValues: [] Estimated droppable tombstones: 1.6053814164518172 SSTable Level: 0 Repaired at: 0 Replay positions covered: {CommitLogPosition(segmentId=1619531657072, position=32250439)=CommitLogPosition(segmentId=1619531660544, position=572568)} totalColumnsSet: 26330626 totalRows: 26330626 Estimated tombstone drop times: 1620929291: 904155 1620936070: 909365 1620942632: 758505 1620948871: 549490 1620954573: 432984 1620959823: 396278 1620964841: 386872 1620969786: 411047 1620974784: 484954 1620980254: 563213 1620986134: 648321 1620992347: 702498 1620998460: 718215 1621004174: 707948 1621009759: 714103 1621015492: 721563 1621021346: 688849 1621027083: 631726 1621032848: 564995 1621038943: 501582 1621045437: 470515 1621052153: 488740 1621058895: 536796 1621065610: 630684 1621072078: 635172 1621078000: 576237 1621083234: 561100 1621088241: 592402 1621093459: 663770 1621099039: 692286 1621104999: 702768 1621111080: 701176 1621117266: 665579 1621123698: 532952 1621130358: 467978 1621136824: 482888 1621143296: 505159 1621149785: 596563 1621156222: 683204 1621162769: 734054 1621169446: 815423 1621176222: 871911 1621183095: 898948 1621190140: 912248 1621197140: 900624 1621204243: 787960 1621211979: 672250 1621219988: 628283 1621227352: 625878 1621233623: 567889 1621238836: 541150 1621243533: 529295 1621248071: 520081 1621252632: 576252 1621257134: 637244 1621261756: 656719 1621266490: 685527 1621271387: 710028 1621276393: 714394 1621281449: 703684 1621286467: 640982 1621291584: 553620 1621296950: 487708 1621302437: 445668 1621308041: 446942 1621313793: 489114 1621319585: 581874 1621325331: 654963 1621331102: 702606 1621337216: 722363 1621343305: 794749 1621349395: 843293 1621355469: 850024 1621361511: 881960 1621367255: 873607 1621373450: 741853 1621379764: 607569 1621386486: 543095 1621393161: 567846 1621399885: 599749 1621406867: 693697 1621413648: 766170 1621420529: 806240 1621427559: 886080 1621434449: 962104 1621441401: 962815 1621448376: 931643 1621455299: 891335 1621462304: 805412 1621469687: 676876 1621477423: 695665 1621485206: 735166 1621493284: 871547 1621501318: 961014 1621509473: 1193683 1621517320: 1243719 1621525295: 1287895 1621532829: 1223501 1621539553: 1067747 1621546454: 861366 Count Row Size Cell Count 1 0 43494794 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 10 0 0 12 0 0 14 0 0 17 0 0 20 0 0 24 0 0 29 0 0 35 0 0 42 0 0 50 0 0 60 0 0 72 17162077 0 86 18882565 0 103 7129824 0 124 0 0 149 0 0 179 0 0 215 0 0 258 0 0 310 0 0 372 0 0 446 14 0 535 90247 0 642 53948 0 770 13085 0 924 97806 0 1109 59931 0 1331 3907 0 1597 264 0 1916 318 0 2299 396 0 2759 266 0 3311 60 0 3973 72 0 4768 5 0 5722 0 0 6866 0 0 8239 0 0 9887 0 0 11864 1 0 14237 2 0 17084 0 0 20501 1 0 24601 0 0 29521 0 0 35425 0 0 42510 0 0 51012 0 0 61214 0 0 73457 0 0 88148 0 0 105778 0 0 126934 0 0 152321 3 0 182785 2 0 219342 0 0 263210 0 0 315852 0 0 379022 0 0 454826 0 0 545791 0 0 654949 0 0 785939 0 0 943127 0 0 1131752 0 0 1358102 0 0 1629722 0 0 1955666 0 0 2346799 0 0 2816159 0 0 3379391 0 0 4055269 0 0 4866323 0 0 5839588 0 0 7007506 0 0 8409007 0 0 10090808 0 0 12108970 0 0 14530764 0 0 17436917 0 0 20924300 0 0 25109160 0 0 30130992 0 0 36157190 0 0 43388628 0 0 52066354 0 0 62479625 0 0 74975550 0 0 89970660 0 0 107964792 0 0 129557750 0 0 155469300 0 0 186563160 0 0 223875792 0 0 268650950 0 0 322381140 0 0 386857368 0 0 464228842 0 0 557074610 0 0 668489532 0 0 802187438 0 0 962624926 0 0 1155149911 0 0 1386179893 0 0 1663415872 0 0 1996099046 0 0 2395318855 0 0 2874382626 0 3449259151 0 4139110981 0 4966933177 0 5960319812 0 7152383774 0 8582860529 0 10299432635 0 12359319162 0 14831182994 0 17797419593 0 21356903512 0 25628284214 0 30753941057 0 36904729268 0 44285675122 0 53142810146 0 63771372175 0 76525646610 0 91830775932 0 110196931118 0 132236317342 0 158683580810 0 190420296972 0 228504356366 0 274205227639 0 329046273167 0 394855527800 0 473826633360 0 568591960032 0 682310352038 0 818772422446 0 982526906935 0 1179032288322 0 1414838745986 0 Estimated cardinality: 43517593 EncodingStats minTTL: 0 EncodingStats minLocalDeletionTime: 1620925061 EncodingStats minTimestamp: 1620925061842965 KeyType: org.apache.cassandra.db.marshal.UTF8Type ClusteringTypes: [] StaticColumns: {} RegularColumns: {value:org.apache.cassandra.db.marshal.BytesType} root@sessionstore1001:/srv/cassandra-a/data/sessions/values-d93398104b1e11e9b54815441d81640b#
So we are in fact below 30 days of retention (and well below the 90 days we are "allowed").