Page MenuHomePhabricator

Retain nonsensitive mediawiki_api_request logging data
Closed, DeclinedPublic

Description

db: presto_analytics_hive, schema: event, table: mediawiki_api_request contains data on API requests to our projects that would be useful for Partnerships and Okapi analysis but is set to expire soon – requesting to dump the non-sensitive fields into a temporary table for further analysis!

cc @RBrounley_WMF

Event Timeline

We can keep data for longer than 90 days that has no identifying fields. Just need to submit a changeset that lists those fields. Please take a look at docs: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention

It will help to work with a data analysts that is familiar with systems

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Just following up, the data is being collected on an ongoing basis, and we always have the last 90 days of data.

(I initially made a mistake by looking only at eqiad but right now the data's coming from codfw)

Right now, we're waiting on @Maryana & others to let us know what fields they would like to keep on an ongoing basis, and I can help them implement that in the sanitization whitelist yaml. This is here, and is very self-explanatory, direct patches welcome!

https://github.com/wikimedia/analytics-refinery/blob/master/static_data/eventlogging/whitelist.yaml

razzi subscribed.

Closing since there has been no reply; feel free to reopen.