Page MenuHomePhabricator

iridium:/var/log/phd/daemons.log is growing too much (took 20% of filesystem space)
Closed, ResolvedPublic

Description

I've compressed it to daemons.log.xz, but probably we do not need a rotation policy, only avoid the only error being logged:

(I am unsure how public it is, so I am truncating certains strings)

STDERR
[2016-01-25 12:21:54] EXCEPTION: (Exception) git fetch failed with error #128:
stdout:

stderr:fatal: remote error: Git repository not found

 at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:337]
arcanist(head=XXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXXXXXXXXXX), phabricator(head=production, ref.master=XXXXXXXXX, ref.production=XXXXXXXXXXXX, custom=7), phutil(head=XXXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXXXXX), security(head=XXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXX), sprint(head=XXXXXXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXX)
  #0 PhabricatorRepositoryPullEngine::executeGitUpdate() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:93]
  #1 PhabricatorRepositoryPullEngine::pullRepository() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementUpdateWorkflow.php:80]
  #2 Phabricator... (380 more bytes) ... at [<phutil>/src/future/exec/ExecFuture.php:416]
[25-Jan-2016 12:21:55 UTC] arcanist(head=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXXX), phabricator(head=production, ref.master=XXXXXXXXXXX, ref.production=XXXXXXXXX, custom=7), phutil(head=d6c7d72bb2142b63197126e553a885c8072f9101, ref.master=XXXXXXXX), security(head=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXXXX), sprint(head=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX, ref.master=XXXXXXXX)
[25-Jan-2016 12:21:55 UTC]   #0 <#3> ExecFuture::resolvex() called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:361]
[25-Jan-2016 12:21:55 UTC]   #1 phlog(PhutilProxyException) called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:368]
[25-Jan-2016 12:21:55 UTC]   #2 PhabricatorRepositoryPullLocalDaemon::resolveUpdateFuture(PhabricatorRepository, ExecFuture, integer) called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:201]
[25-Jan-2016 12:21:55 UTC]   #3 PhabricatorRepositoryPullLocalDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:183]
[25-Jan-2016 12:21:55 UTC]   #4 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:125]

Event Timeline

jcrespo created this task.Jan 25 2016, 12:27 PM
jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added a project: Phabricator.
jcrespo added a subscriber: jcrespo.
Restricted Application added subscribers: StudiesWorld, Luke081515, scfc, Aklapper. · View Herald TranscriptJan 25 2016, 12:27 PM
Luke081515 moved this task from To Triage to Need discussion on the Phabricator board.

To clarify, this was created because it almost consumes all filesystem space, and that is usually considered a bad idea. Hope that was clear.

I really thought there was one for logrotate but maybe not

dduvall set Security to None.
demon claimed this task.Jan 27 2016, 7:09 PM
demon added a subscriber: demon.

Gotcha. So what happens here is that when repos are deleted on Gerrit they're not necessarily marked as inactive here. I've already gone and marked 4 as inactive (rEWQL, rEVVI and rESHP, rEQEV which seem to be the loudest).

demon triaged this task as Unbreak Now! priority.Jan 27 2016, 7:09 PM
demon added a comment.Jan 27 2016, 7:12 PM

Going through the compressed log now to find any other offenders.

I do not think this requires "unbreak now" priority- I personally made sure to archive it before it the filesystem got full. It is that it will happen again if not corrected. But your decision, of course.

demon lowered the priority of this task from Unbreak Now! to Medium.Jan 27 2016, 7:17 PM

UBN was mainly until we got the log to stop growing uncontrollably. I think those repos were it, so I'm lowering it again.

I think we should have logrotate on the phd log. There are a lot of conditions that come up which will cause phd to log somewhat rapidly and old phd logs are not very useful for anything that I can think of. Anything over a week or two is probably just a waste of space.

demon added a comment.Jan 29 2016, 3:56 PM

We should have it in place already. From /etc/logrotate.d/phd:

/var/log/phd {
    daily
    compress
    missingok
    notifempty
    rotate 7
}

/var/log/phd/daemons.log {
    daily
    compress
    missingok
    notifempty
    rotate 7
}
mmodell added a comment.EditedJan 29 2016, 9:54 PM

so it was growing fast enough to fill the disk even on a short rotation schedule? That's no good.

The root partition on iridium is tiny.

Can we move /var onto the /srv volume?

Yes, I got it on the 6% alert. It doesn't have the flexibility of LVM, which was the first thing I checked to solve this.

@jcrespo: Couldn't we just make /var into a symlink to /srv/var or would that cause problems that I'm not aware of?

Actually I can guess that maybe /var needs to be mounted before /srv?

I _could_ just move phd.log off /var, if that isn't some kind of violation of Operations policies / preferences.

demon removed demon as the assignee of this task.Jun 1 2016, 3:56 PM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJun 1 2016, 3:56 PM

My daemons.log is also occupying too much space. Can I safely delete some of its content?

Yes, and best practice is to use logrotate to cycle your logs periodically in order to avoid running out of space.

mmodell closed this task as Resolved.May 22 2018, 8:06 AM
mmodell claimed this task.

AFAIK This is no longer an issue in WMF production. (and iridium doesn't exist anymore)