Page MenuHomePhabricator

Gerrit seemingly violates data retention guidelines
Closed, ResolvedPublic

Description

An example:

access.log.38.gz:96.x.x.x - - [09/Jan/2015:02:24:52 +0000] "GET /r/config/server/top-menus HTTP/1.1" 200 626 T=0s "https://gerrit.wikimedia.org/r/#/q/owner:%22Chasemp+%253Cchasemp%254<foo>.com%253E%22,n,z" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

We have this kind of stuff back to 2014. In some cases it could be construed as 'who I am/What I care about/Where I am from'.

Event Timeline

chasemp raised the priority of this task from to Medium.
chasemp updated the task description. (Show Details)
chasemp added a subscriber: chasemp.
Dzahn raised the priority of this task from Medium to High.
Dzahn set Security to None.

Gerrit runs on ytterbium and Apache2 has a logrotate rule:

$ cat /etc/logrotate.d/apache2 
/var/log/apache2/*.log {
        daily
        missingok
        rotate 30
        compress
        delaycompress
        notifempty
        create 640 root adm
        sharedscripts
        postrotate
                /etc/init.d/apache2 reload > /dev/null
        endscript
        prerotate
                if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                        run-parts /etc/logrotate.d/httpd-prerotate; \
                fi; \
        endscript
}

Gerrit also has some logs under /var/lib/gerrit2/review_site/logs which I believe are logrotated by Gerrit itself over a week.

I guess whatever access.log.*.gz old files we have is a leftover after the logrotate rule got adjusted to 30 days and could be safely deleted on ytterbium.

I can not access /var/log/apache2 to confirm, we are not in the adm system group.

hashar lowered the priority of this task from High to Medium.Jun 21 2016, 9:54 AM
hashar moved this task from INBOX to Backlog (ARCHIVED) on the Release-Engineering-Team board.

Has this been adjusted so that it deletes the logs after 30 days?

Has this been adjusted so that it deletes the logs after 30 days?

Based on what Antione said above, yes, all new logs are deleted after 30 days.

@Dzahn / @chasemp (Daniel is assigned, but Chase found it...): can you verify that the timestamps suggest that all new logs are rotated after 30 days and the files you (Chase) found must have just been left over when we switched to the 30 day rotation? And if so, please just delete them! Thanks!

there were still some files older than 90 there

-rw-r----- 1 root adm 39441687 Feb 22 2015 access.log.51.gz

fyi I nuked them so anything going forward is a $new_issue.

Mentioned in SAL [2016-06-23T16:27:54Z] <chasemp> remove old log files on ytterbium for T114395