Page MenuHomePhabricator

puppet servers run out of inodes in puppet code volume
Closed, ResolvedPublic

Description

metricsinfra-puppetserver-1.metricsinfra.eqiad1.wikimedia.cloud is failing to update its puppet repo because /srv is out of inodes. The immediate problem is pretty clear:

root@metricsinfra-puppetserver-1:~# ls /srv/puppet_code/environments_staging/
oot_branch_202405020945  oot_branch_202405021037  oot_branch_202405021119  oot_branch_202405021252  oot_branch_202405021426  oot_branch_202405021548  oot_branch_202405021701
oot_branch_202405020955  oot_branch_202405021047  oot_branch_202405021130  oot_branch_202405021303  oot_branch_202405021446  oot_branch_202405021630  oot_branch_202405021712
oot_branch_202405021016  oot_branch_202405021058  oot_branch_202405021150  oot_branch_202405021314  oot_branch_202405021507  oot_branch_202405021640  oot_branch_202405021743
oot_branch_202405021026  oot_branch_202405021108  oot_branch_202405021221  oot_branch_202405021355  oot_branch_202405021528  oot_branch_202405021651  production

This is probably an interaction between git-sync-upstream.py (which creates those oot branches for rebasing purposes) and puppetserver-deploy-code.sh (which invokes g10k, which copies everything into /srv/puppet_code). I suspect it's one or more of the following:

  1. g10k copies everything over blindly, meaning that if there are multiple code branches active in any deployed directory if piles them up in /srv/puppet_code and never cleans up.
  1. git-sync-upstream creates temporary branches as part of the merge process and likely triggers git hooks while managing that branch that in turn invoke puppetserver-deploy-code
  1. The puppetserver-deploy-code uses absolute paths and is invoked by git hooks. So git actions anywhere on a system results in a deployment from /srv/git/operations/puppet

Event Timeline

Change #1026682 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetserver-deploy-code: bail out if current branch is not 'production'

https://gerrit.wikimedia.org/r/1026682

Change #1025818 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetserver-deploy-code: add -force to g10k call to invoke purging

https://gerrit.wikimedia.org/r/1025818

ok, I think I understand what is happening now, my apologies for my ignorance on how g10k or r10k uses branches.

  • g10k by default deploys a copy of every branch to the basedir specified in its configuration
  • the source repo in prod, /srv/git/operations/puppet only has a single branch, production, so that is the one g10k deploys
  • WMCS' source repo has many branches due to their use of modules/puppetmaster/files/git-sync-upstream.py
  • g10k filled up the WMCS disk by keeping all those ephemeral branches around.

I think easiest solution is to specify the branch explicity with g10k as you did in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1026682, as that
will deploy a single branch, so I will comment there.

jhathaway triaged this task as Medium priority.May 6 2024, 2:17 PM

Change #1026682 merged by Andrew Bogott:

[operations/puppet@production] puppetserver-deploy-code: bail out if current branch is not 'production'

https://gerrit.wikimedia.org/r/1026682

Change #1029198 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppet-git-sync-upstream: run as 'gitpuppet' user

https://gerrit.wikimedia.org/r/1029198

Change #1029198 abandoned by Andrew Bogott:

[operations/puppet@production] puppet-git-sync-upstream: run as 'gitpuppet' user

Reason:

A better (but still not fantastic) option here is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1030962

https://gerrit.wikimedia.org/r/1029198

Change #1025818 abandoned by Andrew Bogott:

[operations/puppet@production] puppetserver-deploy-code: add -force to g10k call to invoke purging

Reason:

We're trying to avoid copying the xtra dirs rather than purging them after.

https://gerrit.wikimedia.org/r/1025818