Page MenuHomePhabricator

Ownership confusion on cloud-local puppet servers
Open, HighPublic

Description

On cloud-vps puppetservers the puppet checkouts are owned by the 'gitpuppet' user and kept in /srv/git.

Puppet 7 has some new ownership constraints which means that we can no longer investigate these repos as root, for example:

root@metricsinfra-puppetserver-1:/srv/git/operations/puppet# git branch
fatal: detected dubious ownership in repository at '/srv/git/operations/puppet'
To add an exception for this directory, call:

	git config --global --add safe.directory /srv/git/operations/puppet

This produces several problems:

  1. puppetserver-deploy-code doesn't work. I recently modified it to check the branch before deploying, but since it runs as root it can't check the branch and errors out.
  1. Users (including me) are likely to run the suggested '--add safe' command and get on with things. That appears to work but breaks future runs of git-sync-upstream (something the user may not notice until weeks or months later).

Arturo points out that this has been a chronic issue even before puppet 7

T325280: cloud puppetmasters fail to run git-sync-upstream - https://phabricator.wikimedia.org/T325280
T184259: labspuppetmaster1001: have consistency in owner of git repos - https://phabricator.wikimedia.org/T184259
T152060: Inconsistent groups for Git repositories with role::puppetmaster::standalone - https://phabricator.wikimedia.org/T152060

I think the right solution here is to be like prod and just have /srv/git things owned by root. The gitpuppet user was initially created to avoid running git-sync-upstream but that script has since been rewritten to run as root and setuid to gitpuppet for select actions which is arguably just as risky as leaving things root-owned.

The alternative is refactoring puppetserver-deploy-code to run as an arbitrary user... I went down that road a bit and it started to get messy. OR, we could rip out the branch check in puppetserver-deploy-code and let it run blindly, but that was what produced our parent task T364047

Event Timeline

I'm now learned that new prod puppservers also use the 'gitpuppet' user. So eliminating that user will increase the diff with prod rather than shrink it. So that's not the right path forward. Probably I should just figure out a fix for case 1.

Change #1030962 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetserver-deploy-code.sh: use 'gitpuppet' user to check current branch

https://gerrit.wikimedia.org/r/1030962

Puppet 7 has some new ownership constraints which means that we can no longer investigate these repos as root, for example:

FYI this is an artifact of new version of git not puppet. you will need something like the following on cloud standalon puppet masters

git::systemconfig { 'safe.directory-labs-private':
  settings => {
    'safe' => {
      'directory' => '/var/lib/git/labs/private',           
    }
  }
}

you should not do this at the global level as we don't want people manually making adits to the checked out repo in production

Change #1030962 merged by Andrew Bogott:

[operations/puppet@production] puppetserver-deploy-code.sh: use 'gitpuppet' user to check current branch

https://gerrit.wikimedia.org/r/1030962

This issue caused tools-puppetserver-01 (the Puppet server for all instances in the tools project) to remain stuck a few days behind operations/puppet. I fixed it by manually applying the patch above (https://gerrit.wikimedia.org/r/1030962), then running manually /usr/local/bin/puppetserver-deploy-code.

We should check if the same fix is needed on other Puppet servers.

There is something that I'm failing to understand, which is why Puppet was not just stuck on an old version, but until my fix it was cycling randomly between a set of different commits: 52d9b91b86 (which was merged May 17), b365b43629 (merged May 7), 83a41abb14 (merged May 7) and eb5da72b5a (merged May 14). This is shown in the puppet.log below, and the same happened in all the servers using the same puppetmaster, e.g. tools-redis-5.

root@tools-puppetserver-01:~# grep Applying /var/log/puppet.log
2024-05-27T00:09:09.477706+00:00 tools-puppetserver-01 puppet-agent[1876836]: Applying configuration version '(52d9b91b86) Dzahn - gerrit: fix team name for https monitor alerting'
2024-05-27T00:39:16.575353+00:00 tools-puppetserver-01 puppet-agent[1882141]: Applying configuration version '(52d9b91b86) Dzahn - gerrit: fix team name for https monitor alerting'
2024-05-27T01:09:17.796338+00:00 tools-puppetserver-01 puppet-agent[1887349]: Applying configuration version '(b365b43629) Andrea Denisse - wmcs: Remove unnecesary kibana and kibana-discovery certificates'
2024-05-27T01:38:30.577163+00:00 tools-puppetserver-01 puppet-agent[1892436]: Applying configuration version '(b365b43629) Andrea Denisse - wmcs: Remove unnecesary kibana and kibana-discovery certificates'
2024-05-27T02:09:13.271527+00:00 tools-puppetserver-01 puppet-agent[1897980]: Applying configuration version '(52d9b91b86) Dzahn - gerrit: fix team name for https monitor alerting'
2024-05-27T02:39:10.448068+00:00 tools-puppetserver-01 puppet-agent[1903131]: Applying configuration version '(83a41abb14) Keith Herron - pyrra: logstash: add generic rules workaround'
2024-05-27T03:09:14.788307+00:00 tools-puppetserver-01 puppet-agent[1908208]: Applying configuration version '(b365b43629) Andrea Denisse - wmcs: Remove unnecesary kibana and kibana-discovery certificates'
2024-05-27T03:38:40.472265+00:00 tools-puppetserver-01 puppet-agent[1913199]: Applying configuration version '(83a41abb14) Keith Herron - pyrra: logstash: add generic rules workaround'
2024-05-27T04:08:44.129801+00:00 tools-puppetserver-01 puppet-agent[1918322]: Applying configuration version '(83a41abb14) Keith Herron - pyrra: logstash: add generic rules workaround'
2024-05-27T04:38:58.104331+00:00 tools-puppetserver-01 puppet-agent[1923662]: Applying configuration version '(b365b43629) Andrea Denisse - wmcs: Remove unnecesary kibana and kibana-discovery certificates'
2024-05-27T05:08:56.097149+00:00 tools-puppetserver-01 puppet-agent[1928909]: Applying configuration version '(b365b43629) Andrea Denisse - wmcs: Remove unnecesary kibana and kibana-discovery certificates'
2024-05-27T05:38:42.201303+00:00 tools-puppetserver-01 puppet-agent[1934139]: Applying configuration version '(52d9b91b86) Dzahn - gerrit: fix team name for https monitor alerting'
2024-05-27T06:08:46.906825+00:00 tools-puppetserver-01 puppet-agent[1939246]: Applying configuration version '(eb5da72b5a) Jcrespo - dbbackups: Update the list of valid sections to check for WMFbackups'
2024-05-27T06:39:24.124063+00:00 tools-puppetserver-01 puppet-agent[1944702]: Applying configuration version '(83a41abb14) Keith Herron - pyrra: logstash: add generic rules workaround'
fnegri changed the task status from Open to In Progress.Mon, May 27, 6:05 PM
fnegri triaged this task as High priority.

The gitpuppet patch was missing on three other puppet servers:

taavi-puppet7-server.testlabs.eqiad1.wikimedia.cloud,taavi-testpuppet-1.testlabs.eqiad1.wikimedia.cloud,toolsbeta-puppetserver-1.toolsbeta.eqiad1.wikimedia.cloud

I fixed the toolsbeta one, and am ignoring the taavi test hosts because they're probably on the verge of being deleted.

Thanks @Andrew. I think we can probably resolve this as your patch should fix most of the issues. We can reopen in case permissions keep causing problems.

pmiazga subscribed.

I'm going to re-open this ticket as this is still an issue on deployment-prep.deployment-puppetserver-1.

pmiazga@deployment-puppetserver-1:/srv/git/operations/puppet$ git log
fatal: detected dubious ownership in repository at '/srv/git/operations/puppet'
To add an exception for this directory, call:

	git config --global --add safe.directory /srv/git/operations/puppet

Some time ago, when with I checked the puppet-server-1 for beta cluster, the srv/git/operations/puppet was behind production branch by over 200 patches. The problem was most likely file permissions, after fixing those looks like it can pull the changes again. But the file owners are still incorrect. When I do ls -la in /srv/git/operations/puppet/.git folder, I get

gitpuppet@deployment-puppetserver-1:/srv/git/operations/puppet/.git$ ls -la
total 2968
drwxr-xr-x   8 gitpuppet gitpuppet    4096 Jun  7 10:16 .
drwxr-xr-x  15 gitpuppet gitpuppet    4096 May  9 20:51 ..
-rw-r--r--   1 gitpuppet gitpuppet     379 Jun  4 17:21 COMMIT_EDITMSG
-rw-r--r--   1 gitpuppet gitpuppet    1289 Jun  7 10:12 FETCH_HEAD
-rw-r--r--   1 gitpuppet gitpuppet      27 Mar 14 20:06 HEAD
-rw-r--r--   1 gitpuppet root           41 Jun  7 10:01 ORIG_HEAD
drwxr-xr-x   2 gitpuppet gitpuppet    4096 Feb  5  2020 branches
-rw-r--r--   1 gitpuppet root          305 Jun  7 10:01 config
-rw-r--r--   1 gitpuppet gitpuppet      73 Feb  5  2020 description
drwxr-xr-x   2 gitpuppet gitpuppet    4096 May 20 15:13 hooks
-rw-r--r--   1 gitpuppet gitpuppet 1360646 Jun  7 10:15 index
drwxr-xr-x   2 gitpuppet gitpuppet    4096 Jun  4 10:11 info
drwxr-xr-x   3 gitpuppet gitpuppet    4096 Jun  7 10:12 logs
drwxr-xr-x 204 gitpuppet gitpuppet    4096 Jun  7 10:12 objects
-rw-r--r--   1 gitpuppet root      1616477 Jun  7 10:12 packed-refs
drwxr-xr-x   5 gitpuppet gitpuppet    4096 Jun  7 10:12 refs

As you see, some of files are owned by gitpuppet:root. This causes problems with cherry-picking changes. When I try to get new changes as gitpuppet user, it ends up with:

gitpuppet@deployment-puppetserver-1:/srv/git/operations/puppet$ git fetch https://gerrit.wikimedia.org/r/operations/puppet refs/changes/22/1039822/1 && git cherry-pick FETCH_HEAD
remote: Counting objects: 3007, done
remote: Finding sources: 100% (7/7)
remote: Getting sizes: 100% (1/1)
remote: Total 7 (delta 5), reused 7 (delta 5)
error: insufficient permission for adding an object to repository database .git/objects
fatal: failed to write object
fatal: unpack-objects failed

To make this work, I need to call chown before.

pmiazga@deployment-puppetserver-1:/srv/git/operations/puppet$ sudo su -
root@deployment-puppetserver-1:/srv/git/operations/puppet# chown gitpuppet:gitpuppet cd /srv/git/operations/puppet/.git -R

I noticed that even when I fix rights, those get messed up every time it pulls new changes.

The permissions look similar on tools-puppetserver-01:

fnegri@tools-puppetserver-01:~$ ls -la /srv/git/operations/puppet/.git
total 3468
drwxr-xr-x   8 gitpuppet gitpuppet    4096 Jun 10 12:45 .
drwxr-xr-x  15 gitpuppet gitpuppet    4096 May  9 20:51 ..
-rw-r--r--   1 gitpuppet gitpuppet     622 Jun  3 16:21 COMMIT_EDITMSG
-rw-r--r--   1 gitpuppet gitpuppet    1289 Jun 10 13:09 FETCH_HEAD
-rw-r--r--   1 gitpuppet gitpuppet      27 Dec  6  2022 HEAD
-rw-r--r--   1 gitpuppet root           41 Jun 10 12:45 ORIG_HEAD
drwxr-xr-x   2 gitpuppet gitpuppet    4096 Feb 16  2020 branches
-rw-r--r--   1 gitpuppet root          305 Jun 10 12:45 config
-rw-r--r--   1 gitpuppet gitpuppet      73 Feb 16  2020 description
drwxr-xr-x   2 gitpuppet gitpuppet    4096 Jun  3 16:21 hooks
-rw-r--r--   1 gitpuppet root      1361189 Jun 10 12:45 index
drwxr-xr-x   2 gitpuppet gitpuppet    4096 Jun  3 16:22 info
drwxr-xr-x   3 gitpuppet gitpuppet    4096 Jun  3 16:21 logs
drwxr-xr-x 255 gitpuppet gitpuppet    4096 Jun 10 13:09 objects
-rw-r--r--   1 gitpuppet root      2125954 Jun 10 12:45 packed-refs
drwxr-xr-x   5 gitpuppet gitpuppet    4096 Feb 16  2020 refs

But Git is not complaining if I use the gitpuppet user:

fnegri@tools-puppetserver-01:/srv/git/operations/puppet$ git log
fatal: detected dubious ownership in repository at '/srv/git/operations/puppet'
To add an exception for this directory, call:

        git config --global --add safe.directory /srv/git/operations/puppet
fnegri@tools-puppetserver-01:/srv/git/operations/puppet$ sudo -u gitpuppet git log
commit 8ac11c0ac16f5ef5f1f6cf15427c878f91e8b045 (HEAD -> production, tag: snapshot-202406101315, origin/production, origin/HEAD)

[...]

Cherry-picking is also working:

fnegri@tools-puppetserver-01:~$ sudo -i -u gitpuppet
$ git fetch https://gerrit.wikimedia.org/r/operations/puppet refs/changes/30/1036230/2 && git cherry-pick FETCH_HEAD
remote: Counting objects: 14100, done
remote: Finding sources: 100% (7/7)
remote: Getting sizes: 100% (6/6)
remote: Compressing objects: 100% (27487/27487)
remote: Total 7 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (7/7), 9.49 KiB | 1.36 MiB/s, done.
From https://gerrit.wikimedia.org/r/operations/puppet
 * branch                  refs/changes/30/1036230/2 -> FETCH_HEAD
Auto-merging hieradata/cloud/eqiad1/deployment-prep/common.yaml
INFO: Puppet code deployed
INFO: Puppet code cache evicted
[production b255aaac16] [POC][beta] Add rewrite rule for sso.wikimedia.beta.wmflabs.org
 Author: Gergő Tisza <tgr.huwiki@gmail.com>
 Date: Mon May 27 12:19:02 2024 +0200
 1 file changed, 8 insertions(+)

Seems like someone has manually been using the root user (instead of gitpuppet) to do things there

root@deployment-puppetserver-1:~# cat .gitconfig 
[safe]
	directory = /srv/git/labs/private
	directory = /srv/git/operations/puppet

That would explain what you're seeing.

I just deleted root's gitconfig file to restore the "wrong user" warning. I don't see a git flag to disable the wrong "run this command" advice it's giving, unfortunately.

Maybe we could prevent the root user from running git with something like alias git="echo run git with the gitpuppet user"?

Do aliases/bashrc apply when running just sudo git foo (without -i)?

Do aliases/bashrc apply when running just sudo git foo (without -i)?

No, good point :/