Page MenuHomePhabricator

Establish data retention guidelines
Closed, ResolvedPublic

Description

In order to protect user privacy, we shouldn't keep data in the sopel database for longer than needed.

This means:

  • Deleting data when channels leave after 45 days.
  • Deleting data of users not seen in 18 months.

I have already removed data from old channels and will do the old people over this week.

We also should establish a data retention page.

This will also help us reduce the size of the database

Event Timeline

RhinosF1 triaged this task as Medium priority.Jun 1 2020, 1:04 PM

19
214
315
44
513
617
719
825
928
103
1111
1238
1342
1445
1548
1646
1754
1830
192
2055
2156
2212
2360
2461
2564
2665
2767
2868
2980
3047
3185
3286
337
3487
3588
3689
3798
3895
39100
40102
41101
4299
43106
44108
45109
4673
4777
48111
49113
50117
51118
52119
53122
54136
5572
56141
5783
58142
59145
60150
61152
62153
63155
64161
65164
66166
67149
68167
69168
70169
71156
72176
73177
74179
75180
76181
77183
78184
79186
80178
81188
82187
83189
84191
85193
8675
87196
88195
89199
90200
91204
92198
9352
94206
95208
96207
97212
98213
99182
100201
101215
102216
103217
10469
105219
106214
107221
108223
109205
110224
11131
112218
113220
114228
115229
116203
117230
118146
119194
120232
121233
122235
123225
124237
125238
126121
12737
128240
129241
130243
131226
132244
133245
134248
135250
136236
137252
138254
139255
140257
141258
142259
143260
144261
145262
146263
147266
148264
149269
150270
151271
152272
153273
154275
155274
15663
157277
158276
159116
160279
161280
162283
163282
164284
165285
166286
167288
168289
169202
170227
171290
172291
173287
174292
175298
176256
177297
178300
179234
180302
181303
182306
183307
184308
185309
186310
187311
188251
189312
190313
191314
192315
193317
194320
195321
196322
197323
198324
199326
200328
201329
202330
203331
204332
205327
206333
207334
208335
209337
210338
211339
212340
213342
214343
215344
216345
217346
218347
219348
220349
221350
222351
223352
224353
225354
226355
227356
228357
229358
230359
231360
232361
233362
234363
235364
236366
237367
238368
239369
240370
241371
242372
243373
244374
245375
246376
247377
248378
249379
250380
251381
252382
253383
254384
255385
256386
257387
258388
259389
260390
261391
262392
263393
264394
265336
266395
267397
268396
269398
270399
271400
272401
273402
274403
275404
276405
277406
278407
279409
280410
281411
282412
283413
284408
285414
286415
287416
288417
289418
290419
291420
292421
293422
294423
295424
296425
297426
298428
299429
300430
301431
302432
303433
304434
305435
306436
307437
308438
309439
310440
311441
312442
313443
314444
315445
316446
317447
318448
319449
320450
321451
322452
323453
324454
325455
326456
327457
328458
329222
330459
331460
33227
333462
334341
335463
336170
337157
33810
339105
340464
341427
342325
343465
344466
345467
346468
347469
348475
349476
350477
351471
352474
353103
354481
355483
356484
357482
358485
359239
360486
361487
36271
363488
364490
365489
366479
367494
368299
369495
370496
371497
372265
373498
374499
37590
376500
377501
378502
379160
380504
381491
382506
38379
38458
385508
386509
387510
388511
389513
390512
391492
392516
393517
394518
395520
396514
397519
398522
399523
400253
401521
402278
403528
404529
405530
406505
407515
408531
409524
410532
411533
412534
413535
414536
415540
416541
417542

there is no "easy" way to do this

  1. Generate the list
  2. Drop the nick_id from nick_values
  3. Drop the nick_id from nicknames
  4. Delete the key from nick_ids

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T13:30:13Z] <RhinosF1> ran select nick_id from nick_values where value < 1546300800 and key = "seen_timestamp" on default.db to generate P11346 for T254154

We can make an python script to run the SQL and remove the data from the tables

In SQL:

DELETE from nick_values where nick_id is 34;
DELETE from nicknames where nick_id is 34;
DELETE from nick_ids where nick_id is 34;

would drop user 34 from everywhere in default.db

In SQL:

DELETE from nick_values where nick_id is 34;
DELETE from nicknames where nick_id is 34;
DELETE from nick_ids where nick_id is 34;

would drop user 34 from everywhere in default.db

Looks alright to me. We can begin a python script to automate this tomorrow. For a more long-term solution we can make a python script to review the data in the tables to see what needs to be removed.

For a more long-term solution we can make a python script to review the data in the tables to see what needs to be removed.

That's 1 select query

For a more long-term solution we can make a python script to review the data in the tables to see what needs to be removed.

That's 1 select query

By that I meant to add on to the script to make it one automated process.

EXAMPLE:

select nick_id from nick_values where value < 1546300800 and key = "seen_timestamp"

for criteria 2

select nick_id from nick_values where value < "##RhinosF1" and key = "seen_channel"

for criteria 1

For a more long-term solution we can make a python script to review the data in the tables to see what needs to be removed.

That's 1 select query

By that I meant to add on to the script to make it one automated process.

ah

Mentioned in SAL (#wikimedia-cloud) [2020-06-02T10:06:12Z] <RhinosF1> run dbclean.py using

19
214
315
44
513
617
719
825
928
103
1111
1238
1342
1445
1548
1646
1754
1830
192
2055
2156
2212
2360
2461
2564
2665
2767
2868
2980
3047
3185
3286
337
3487
3588
3689
3798
3895
39100
40102
41101
4299
43106
44108
45109
4673
4777
48111
49113
50117
51118
52119
53122
54136
5572
56141
5783
58142
59145
60150
61152
62153
63155
64161
65164
66166
67149
68167
69168
70169
71156
72176
73177
74179
75180
76181
77183
78184
79186
80178
81188
82187
83189
84191
85193
8675
87196
88195
89199
90200
91204
92198
9352
94206
95208
96207
97212
98213
99182
100201
101215
102216
103217
10469
105219
106214
107221
108223
109205
110224
11131
112218
113220
114228
115229
116203
117230
118146
119194
120232
121233
122235
123225
124237
125238
126121
12737
128240
129241
130243
131226
132244
133245
134248
135250
136236
137252
138254
139255
140257
141258
142259
143260
144261
145262
146263
147266
148264
149269
150270
151271
152272
153273
154275
155274
15663
157277
158276
159116
160279
161280
162283
163282
164284
165285
166286
167288
168289
169202
170227
171290
172291
173287
174292
175298
176256
177297
178300
179234
180302
181303
182306
183307
184308
185309
186310
187311
188251
189312
190313
191314
192315
193317
194320
195321
196322
197323
198324
199326
200328
201329
202330
203331
204332
205327
206333
207334
208335
209337
210338
211339
212340
213342
214343
215344
216345
217346
218347
219348
220349
221350
222351
223352
224353
225354
226355
227356
228357
229358
230359
231360
232361
233362
234363
235364
236366
237367
238368
239369
240370
241371
242372
243373
244374
245375
246376
247377
248378
249379
250380
251381
252382
253383
254384
255385
256386
257387
258388
259389
260390
261391
262392
263393
264394
265336
266395
267397
268396
269398
270399
271400
272401
273402
274403
275404
276405
277406
278407
279409
280410
281411
282412
283413
284408
285414
286415
287416
288417
289418
290419
291420
292421
293422
294423
295424
296425
297426
298428
299429
300430
301431
302432
303433
304434
305435
306436
307437
308438
309439
310440
311441
312442
313443
314444
315445
316446
317447
318448
319449
320450
321451
322452
323453
324454
325455
326456
327457
328458
329222
330459
331460
33227
333462
334341
335463
336170
337157
33810
339105
340464
341427
342325
343465
344466
345467
346468
347469
348475
349476
350477
351471
352474
353103
354481
355483
356484
357482
358485
359239
360486
361487
36271
363488
364490
365489
366479
367494
368299
369495
370496
371497
372265
373498
374499
37590
376500
377501
378502
379160
380504
381491
382506
38379
38458
385508
386509
387510
388511
389513
390512
391492
392516
393517
394518
395520
396514
397519
398522
399523
400253
401521
402278
403528
404529
405530
406505
407515
408531
409524
410532
411533
412534
413535
414536
415540
416541
417542
for T254154 START

Mentioned in SAL (#wikimedia-cloud) [2020-06-02T10:08:05Z] <RhinosF1> run dbclean.py using

19
214
315
44
513
617
719
825
928
103
1111
1238
1342
1445
1548
1646
1754
1830
192
2055
2156
2212
2360
2461
2564
2665
2767
2868
2980
3047
3185
3286
337
3487
3588
3689
3798
3895
39100
40102
41101
4299
43106
44108
45109
4673
4777
48111
49113
50117
51118
52119
53122
54136
5572
56141
5783
58142
59145
60150
61152
62153
63155
64161
65164
66166
67149
68167
69168
70169
71156
72176
73177
74179
75180
76181
77183
78184
79186
80178
81188
82187
83189
84191
85193
8675
87196
88195
89199
90200
91204
92198
9352
94206
95208
96207
97212
98213
99182
100201
101215
102216
103217
10469
105219
106214
107221
108223
109205
110224
11131
112218
113220
114228
115229
116203
117230
118146
119194
120232
121233
122235
123225
124237
125238
126121
12737
128240
129241
130243
131226
132244
133245
134248
135250
136236
137252
138254
139255
140257
141258
142259
143260
144261
145262
146263
147266
148264
149269
150270
151271
152272
153273
154275
155274
15663
157277
158276
159116
160279
161280
162283
163282
164284
165285
166286
167288
168289
169202
170227
171290
172291
173287
174292
175298
176256
177297
178300
179234
180302
181303
182306
183307
184308
185309
186310
187311
188251
189312
190313
191314
192315
193317
194320
195321
196322
197323
198324
199326
200328
201329
202330
203331
204332
205327
206333
207334
208335
209337
210338
211339
212340
213342
214343
215344
216345
217346
218347
219348
220349
221350
222351
223352
224353
225354
226355
227356
228357
229358
230359
231360
232361
233362
234363
235364
236366
237367
238368
239369
240370
241371
242372
243373
244374
245375
246376
247377
248378
249379
250380
251381
252382
253383
254384
255385
256386
257387
258388
259389
260390
261391
262392
263393
264394
265336
266395
267397
268396
269398
270399
271400
272401
273402
274403
275404
276405
277406
278407
279409
280410
281411
282412
283413
284408
285414
286415
287416
288417
289418
290419
291420
292421
293422
294423
295424
296425
297426
298428
299429
300430
301431
302432
303433
304434
305435
306436
307437
308438
309439
310440
311441
312442
313443
314444
315445
316446
317447
318448
319449
320450
321451
322452
323453
324454
325455
326456
327457
328458
329222
330459
331460
33227
333462
334341
335463
336170
337157
33810
339105
340464
341427
342325
343465
344466
345467
346468
347469
348475
349476
350477
351471
352474
353103
354481
355483
356484
357482
358485
359239
360486
361487
36271
363488
364490
365489
366479
367494
368299
369495
370496
371497
372265
373498
374499
37590
376500
377501
378502
379160
380504
381491
382506
38379
38458
385508
386509
387510
388511
389513
390512
391492
392516
393517
394518
395520
396514
397519
398522
399523
400253
401521
402278
403528
404529
405530
406505
407515
408531
409524
410532
411533
412534
413535
414536
415540
416541
417542
for T254154 END

Checking a few bits of orphan data

417 accounts deleted, script created (will probably advance it time). Closing.