Page MenuHomePhabricator

gitlab/devtools: send logs to a new disk
Closed, ResolvedPublic

Description

We keep running out of disk space when upgrading gitlab on our test instance.

Manual fixes involved deleting a bunch of logfiles to free space.

To fix that I suggest to:

  • create a new cinder volume in devtools dedicated to gitlab logs
  • create a new mount via puppet and mount it on gitlab-prod-1002
  • configure gitlab to log to the new location when in cloud (not with a realm check but just by configuring the path in Hiera)

Event Timeline

I deleted a lot of very old GitLab logs (mostly migration and upgrade logs from 2023), freeing up some disk space. Now the disk usage fluctuates between 60% and 80%. I have opened T371093 to request an increased quota for more volume space since we only have 5GB left. Once approved, we can add another disk and just mount it at /var/logs.

Create a new mount via Puppet and mount it on gitlab-prod-1002.

For the backup volume, we haven't done this afaik. The volume was mounted manually.

We could reduce the rotate setting in logrotate to keep logs for just one or two days. This should be sufficient for the test instance. However, the setting I tried to introduce in this Gerrit change does not seem to work properly. This might be related to a bigger issue with the GitLab test instance, where backups are causing significant load and often fail: see Grafana. This could also be interfering with logrotate.

If we reduce the rotate setting in logrotate, we should apply the same change to other services like rsyslog.

LSobanski triaged this task as Medium priority.
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

I made T371573 and if that gets resolved we could also puppetize the mounting of the cinder volume and avoid running the script / mount commands manually.

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-01T19:07:08Z] <mutante> creating new 20GB volume 'gitlab-prod-logs' T371066

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-01T19:11:32Z] <mutante> attaching volume 'gitlab-prod-logs' to instance 'gitlab-prod-1002' T371066 | running 'sudo wmcs-prepare-cinder-volume' manually and answering interactive questions to mount/format it (once T371573 is resolved will do this with puppet)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-01T19:18:41Z] <mutante> gitlab-prod-1002 - moved contents of /var/log/gitlab out of the way temp, emptied it, letting wmcs-prepare-cinder-volume mount new volume into /var/log/gitlab and format it with ext4, moved old existing logs back into it T371066

Alright:

  • created new 20 GB cinder volume (Horizon)
  • attached new volume to instance (Horizon)
  • moved existing contents of /var/log/gitlab temp. to /root/
  • ran wmcs-prepare-cinder-volume and let it detect sdc, mount it directly into /var/log/gitlab and format it with ext4
  • moved the logs back from /root/ into /var/log/gitlab
  • restarted gitlab with gitlab-ctl restart
  • seeing the size of /var/log/gitlab increase

Depending on the outcome of T371573 will puppetize the manual part in the future.

This way we don't even need a config change about where logs are going. All services are running per gitlab-ctl status.

Except I am still waiting for https://gitlab.devtools.wmcloud.org/ to not be in "Waiting for GitLab to boot" anymore.

Dzahn changed the task status from Open to In Progress.Aug 1 2024, 7:30 PM

Had quite the trouble getting gitlab to start up again and debugged for a while.

Turned out to be permission issues after all (though I had only used "mv" to move things!).

So.. stopped everything, removed old logs, mounted the cinder volume again, ran these:

#!/bin/bash

mkdir /var/log/gitlab/gitaly
chown git:root /var/log/gitlab//gitaly

mkdir /var/log/gitlab/gitlab-kas
chown git:root /var/log/gitlab/gitlab-kas

mkdir /var/log/gitlab/gitlab-rails
chown git:root /var/log/gitlab//gitlab-rails

mkdir /var/log/gitlab/gitlab-shell
chown git:root /var/log/gitlab/gitlab-shell

mkdir /var/log/gitlab/gitlab-workhorse
chown git:root /var/log/gitlab/gitlab-workhorse

mkdir /var/log/gitlab/nginx
chown root:gitlab-www /var/log/gitlab/nginx

mkdir /var/log/gitlab/postgresql
chown gitlab-psql:root /var/log/gitlab/postgresql

mkdir /var/log/gitlab/puma
chown git:root /var/log/gitlab/puma

mkdir /var/log/gitlab/redis
chown gitlab-redis:root /var/log/gitlab/redis

mkdir /var/log/gitlab/sidekiq
chown git:root /var/log/gitlab/sidekiq

Started everything up again..

https://gitlab.devtools.wmcloud.org/explore/projects/starred is running and /var/log/gitlab is a separate mount:

..
/dev/sda1        20G   14G  5.2G  74% /
..
/dev/sdb         20G  2.5M   19G   1% /var/log/gitlab
..