Page MenuHomePhabricator

Logrotate fails for: "$FILE No such file or directory"
Closed, ResolvedPublic

Description

Cron Daemon <root@aluminium.wikimedia.org>
7:25 AM (1 hour ago)

to root 
/etc/cron.daily/logrotate:
error: error opening /var/log/squid3/access.log.1.gz: No such file or directory
run-parts: /etc/cron.daily/logrotate exited with return code 1

In this case access.log.1.gz was present on disk and its content was bigger than zero bytes. I had to manually gzip it.

Event Timeline

Change 342586 had a related patch set uploaded (by Elukey):
[operations/puppet] Increase the squid's logrotate log retetion to 2

https://gerrit.wikimedia.org/r/342586

Change 342586 merged by Elukey:
[operations/puppet] Increase the squid's logrotate log retention to 2

https://gerrit.wikimedia.org/r/342586

I had a variant of this during clinic

/etc/cron.daily/logrotate:
error: error creating output file /var/log/squid3/access.log.1.gz: File exists
run-parts: /etc/cron.daily/logrotate exited with return code 1

This is caused when logrotate has a problem between the compress and move stage. the logrotate action would be something like

rm /var/log/squid3/access.log.2.gz
gzip /var/log/squid3/access.log.1
mv  /var/log/squid3/access.log.1.gz  /var/log/squid3/access.log.2.gz
mv /var/log/squid3/access.log /var/log/squid3/access.log.1

if something happens between steps 2 and three then /var/log/squid3/access.log.1.gz gets left behind and the next time logrotate tries todo step 2 it fails

We received a similar alert today: (SystemdUnitFailed) firing: logrotate.service on logstash2003:9100

Systemd service status: error opening /var/log/opensearch/production-elk7-codfw.log: No such file or directory

However, that file exists in the system: -rw-r--r-- 1 opensearch opensearch 34548 Mar 26 01:01 /var/log/opensearch/production-elk7-codfw.log.

Mentioned in SAL (#wikimedia-operations) [2024-03-26T01:05:32Z] <denisse> Starting logrotate.service on logstash2003 - T153940

The production-elk7-codfw.log file was present in the system. After verifying the contents of the file looked correct I manually started the service and the unit is healthy again.