Page MenuHomePhabricator

fix up log retention on log collection/storage hosts
Closed, ResolvedPublic

Description

The following directories on log-collecting hosts need cron jobs to purge old files:

stat1002: /a/log/webrequest/archive (though it currently has from Feb 1 on, I see no purge mechanism around)
erbium:/a/log/webrequest/archive
oxygen:/a/log/webrequest/archive
fluorine: /a/mw-log/archive
gadolinium: /a/log/webrequest/archive

On stat1002 the following need to be cleaned up:
/a/squid/stats/dev_engels/csv/*/*/private/*
/a/quid/stats/csv/*/*/private/*
/a/squid/stats_editors/csv/*/*/private/*

On stat1003 the following need to be cleaned up:
/a/zerosms/logs

On both stat1002 and stat1003 we need to ask ezachte about these:
/a/wikistats_git/squids/csv_edits
/a/wikistats_git/squids/csv
/a/wikistats_git/squids/csv_editors

e.g. 2014-05/2014-05-13/private/DebugSquidDataOutDoNotPublish.txt

and also /a/wikistats_git/backup/

Event Timeline

ArielGlenn claimed this task.
ArielGlenn raised the priority of this task from to High.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added a project: acl*sre-team.
ArielGlenn added subscribers: ArielGlenn, Ottomata.
ArielGlenn added a subscriber: ezachte.

Erik, I added you so you can comment on the /a/wikistats_git/ files.

Ottomata, what do you think about making these have 90 days instead of 180?
puppet/templates/udp2log/logrotate_udp2log_analytics.erb
puppet/templates/udp2log/logrotate_udp2log.erb

+1,

but I'm pretty sure the *_analytics.erb one is not used at all.

on oxygen, all logs in /a/log/webrequest/archive/ of the form zero-*.tsv.log-*gz e.g. zero-digi-malaysia.tsv.log-20130711.gz will have to be removed manually, logrot doesn't see those.

on gadolinium, in /a/log/webrequest/archive/, all logs of the form

edits.tsv.log-*  
mobile-sampled-100.tsv.log-*   
sampled-1000.tsv.log-*  
5xx.tsv.log-*

will need to be deleted manually, logrot won't see those.

asked Yurik (adding him) about /a/zerosms/logs, he will be able to clean that up next week, it needs careful review by him.

on oxygen, all logs in /a/log/webrequest/archive/ of the form zero-*.tsv.log-*gz e.g. zero-digi-malaysia.tsv.log-20130711.gz will have to be removed manually, logrot doesn't see those.

I removed files older than +90 days in /a/log/webrequest/archive on gadolinium. Why won't logrotate see these though?

Dzahn added a parent task: Restricted Task.Jun 21 2016, 5:35 AM

I believe we are in a better place nowadays wrt logs retention in the fleet, ok to resolve or lower priority?

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Resolving after a conversation with @MoritzMuehlenhoff and @ArielGlenn. This task is old enough to not be actionable in its current state. If you feel like this work still needs to happen please reach out to me.