Page MenuHomePhabricator

Data for events from wdqs needs to be deleted after 90 days and/or sanitized
Closed, ResolvedPublic

Description

Data for events from wdqs needs to be deleted after 90 days and/or sanitized similarly to how it is done for cirrus. There is also the option to sanitize data and kept part of it longterm

See change: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/572041/7/modules/profile/manifests/analytics/refinery/job/data_purge.pp

There are two different events persisted to two different directories so data needs to be sanitized for both.

/wmf/data/event/wdqs_external_sparql_query

/wmf/data/event/wdqs_internal_sparql_query/

Event Timeline

Nuria mentioned this in Unknown Object (Task).Mar 5 2020, 9:27 PM

Change 577469 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] [wdqs] purge wdqs query logs

https://gerrit.wikimedia.org/r/577469

@dcausse similar to cirrus data some data can be kept long term if it is on the sanitization list, let us know if you want to do that

dcausse added a subscriber: JAllemandou.

@Nuria thanks, unless @JAllemandou thinks we need more 3 month is OK for me.

Nuria reopened this task as Open.

Change 577469 merged by Elukey:
[operations/puppet@production] profile::analytics::refinery::job::data_purge: purge wdqs query logs

https://gerrit.wikimedia.org/r/577469

TJones claimed this task.