Page MenuHomePhabricator

Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old {hawk}
Closed, ResolvedPublic

Description

To respect our data retention policy the files in this directory need to be pruned appropriately. They are all of the format:

NameOfThing.log-YYYYMMDD.gz

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
Dzahn added a project: audits-data-retention.
Dzahn added a subscriber: ArielGlenn.

Adding @Ottomata and a link to T84618 which is still pending with a number of open subtasks. IIRC some of these were being kept longer due to requests from legal but that was quite some time back. Where are we on that now?

Change 253601 had a related patch set uploaded (by Addshore):
Prune stat1002 /a/mw-log/archive after 30 days

https://gerrit.wikimedia.org/r/253601

Change 253601 abandoned by Addshore:
Prune stat1002 /a/mw-log/archive after 30 days

https://gerrit.wikimedia.org/r/253601

I don't know much about the mw-logs. Maybe @bd808 knows more, or who to ask?

I believe the medawiki logs were rsync'd over at @Ironholds request. They are not all the logs, just two cirrus specific logs. There should be no blockers to clearing out old ones, afaik

The oldest file i see is CirrusSearchRequests.log-20150726.gz. A 90 day retention should have deleted it on Oct 24th i believe, so it seems to be specified but is not working.

Change 253968 had a related patch set uploaded (by Ottomata):
Use mtime instead of ctime when considering file retention, fix retention for mw-logs on stat1002

https://gerrit.wikimedia.org/r/253968

Change 253968 merged by Ottomata:
Use mtime instead of ctime when considering file retention, fix retention for mw-logs on stat1002

https://gerrit.wikimedia.org/r/253968

Ok, yup, bug! the job that removed old files was using ctime instead of mtime, and apparently these files had something changed about them more recently than 90 days.

Along the way, I noticed that the retention jobs remove files in a given destination dir. Since all of these different logs were being rsynced into the same directory, the one with the shortest retention (api.log: 30 days) will win out. I've modified the rsyncs so that they copy into directories named after the log to fix this problem, e.g. /a/mw-log/CirrusSearchRequests/ etc.

Let's wait a few days and make sure this is working, then I think we can close.

Oldest file in these directories is now Aug 25 (today is Nov 23), so looking good.

It would be really good if it could have been explicitly called out (with a ping) that the directories were changing. Several data collection scripts silently broke on the 17th.

Aye ok. I should have been more explicit about this. (You were CCed on this ticket though! :) )

Milimetric renamed this task from Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. to Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old {hawk}.Nov 30 2015, 5:49 PM
Milimetric assigned this task to Ottomata.
Milimetric triaged this task as High priority.
Milimetric moved this task from Next Up to Ready to Deploy on the Analytics-Kanban board.
Milimetric moved this task from Ready to Deploy to Done on the Analytics-Kanban board.