Page MenuHomePhabricator

Automate Gerrit deployment steps
Open, Needs TriagePublic

Description

Deploying Gerrit involves a few manual steps https://wikitech.wikimedia.org/wiki/Gerrit/Upgrade#Deploying

They should be automatized and it seems like scap checks can fulfill that role.

I have finished the scap setup for Gerrit on the devtools WMCS project (T317404). The instances are:

RoleInstance FQDN
Deploymentdeploy-1004.devtools.eqiad1.wikimedia.cloud
Gerritgerrit-prod-1001.devtools.eqiad1.wikimedia.cloud

Fixes:

  • we should disable/enable Puppet which requires a sudo rule
  • Gerrit is deployed using the gerrit2 service user, the deployment should probably be done with a new user which could receive additional sudo privileges
  • changes to gerrit.config by Puppet should result in an error and prompt a rollback
  • the plugin should be extracted after the fetch phase under $SCAP_REV_PATH. Gerrit config points to /var/lib/gerrit2/review_site which is the $SCAP_FINAL_PATH dir.
  • promote should restart the service
  • figure out a check which once the service has started verifies the list of plugins and potentially their versions

Will probably need something similar to what has been done for Phabricator in https://gerrit.wikimedia.org/r/c/operations/puppet/+/370622 and currently in Puppet at:

modules/phabricator/files/phab_deploy_config_deploy.sh
modules/phabricator/files/phab_deploy_finalize.sh
modules/phabricator/files/phab_deploy_promote.sh
modules/phabricator/files/phab_deploy_rollback.sh
modules/phabricator/templates/script-vars.erb

The scripts have sudo rules:

'ALL=(root) NOPASSWD: /usr/local/sbin/phab_deploy_config_deploy',
'ALL=(root) NOPASSWD: /usr/local/sbin/phab_deploy_promote',
'ALL=(root) NOPASSWD: /usr/local/sbin/phab_deploy_rollback',
'ALL=(root) NOPASSWD: /usr/local/sbin/phab_deploy_finalize',

The user being phab-deploy.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptSep 9 2022, 1:08 PM

Change 831093 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.4] scap: automatize plugins handling

https://gerrit.wikimedia.org/r/831093

hashar edited projects, added Gerrit, Release-Engineering-Team, Scap; removed Patch-For-Review.
hashar moved this task from Needs triage to Services improvements on the Scap board.

Change 831913 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: disable automatic plugin handling

https://gerrit.wikimedia.org/r/831913

Change 831916 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: scap checks script to automatize deployment

https://gerrit.wikimedia.org/r/831916

Change 832518 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.4] Use gerrit-deploy for deployment on devtools

https://gerrit.wikimedia.org/r/832518

I have changed the deployment user on devtools project from gerrit2 to gerrit-deployer and we will need a bunch of adjustment on the target.

scap deploy
:* gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud
16:18:18 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'gerrit/gerrit', '-g', 'default', 'fetch', '--refresh-config'] (ran as gerrit-deploy@gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud) returned [70]: WARNING:deploy-local:Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/cli.py", line 511, in run
    app._load_config()
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/deploy.py", line 117, in _load_config
    overrides = self._get_config_overrides()
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/deploy.py", line 555, in _get_config_overrides
    with open(self.context.local_config, "w") as cfg:
PermissionError: [Errno 13] Permission denied: '/srv/deployment/gerrit/gerrit-cache/.config'
ERROR:deploy-local:deploy-local failed: <PermissionError> [Errno 13] Permission denied: '/srv/deployment/gerrit/gerrit-cache/.config'

I think that is because scap::target does not enforce permission rights. On the target I had to:

chown -R gerrit-deploy /srv/deployment/{gerrit,gervert}

After that scap deploy from deploy-1004.devtools.eqiad1.wikimedia.cloud worked appropriately. At least code reached gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud.

That has to be figured out for production when applying profile::gerrit::scap_user: 'gerrit-deploy'

Change 832518 merged by jenkins-bot:

[operations/software/gerrit@deploy/wmf/stable-3.4] Use gerrit-deploy for deployment on devtools

https://gerrit.wikimedia.org/r/832518

Change 832345 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: decouple scap and daemon users

https://gerrit.wikimedia.org/r/832345

Change 832345 merged by Dzahn:

[operations/puppet@production] gerrit: decouple scap and daemon users

https://gerrit.wikimedia.org/r/832345

Mentioned in SAL (#wikimedia-operations) [2022-10-04T17:04:49Z] <mutante> gerrit - deployed 832345 - scap and daemon users became decoupled (T317412)

deployed all these in this order:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/832345 - gerrit: decouple scap and daemon users

https://gerrit.wikimedia.org/r/c/operations/puppet/+/832507 - gerrit: change deployment user on devtools

https://gerrit.wikimedia.org/r/c/operations/puppet/+/833379 - gerrit: make homedir variable

https://gerrit.wikimedia.org/r/c/operations/puppet/+/833385 - gerrit: use daemon_user variable everywhere

Change 831913 abandoned by Hashar:

[operations/puppet@production] gerrit: disable automatic plugin handling

Reason:

I will keep it the automatic reloading of plugins for now. I might relies on it to easily deploy javascript plugin T319378

https://gerrit.wikimedia.org/r/831913

Fun thing $GERRIT_SITE/plugins is a symbolic link to the deployment directory and would thus be owned by gerrit-deploy.

When we run gerrit init --install-all-plugins as gerrit2 user it thus can not write the built in plugins to the directory.

Instead $GERRIT_SITE/plugins should be a regular directory owned by gerrit2. When running install-all-plugins we would need to sync in a fleet of symlinks pointing back to deployment plugins. That is inconvenient :-\

Change 844523 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: sudo rules for scap deployment

https://gerrit.wikimedia.org/r/844523

NOTE: this comment is obsolete

Instead of messing up with symbolic link I went with

  • During *fetch* uncompress the plugins from gerrit.war and put them in the SCAP_REV_DIR
  • After code has been promoted instruct Gerrit init to not process any plugins with: java -jar gerrit.war init --batch --skip-plugins
hashar@deploy-1004:/srv/deployment/gerrit/gerrit$ scap deploy
21:06:27 Started deploy [gerrit/gerrit@967b0d7]
21:06:27 Deploying Rev: HEAD = 7432858c1eb2efb59974f5badffccdf56ac26d96
21:06:28 Started deploy [gerrit/gerrit@967b0d7]: (no justification provided)
21:06:28 
== DEFAULT ==
:* gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud
21:06:32 gerrit/gerrit: fetch stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0) |
21:06:34 gerrit/gerrit: config_deploy stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0) |
21:07:00 gerrit/gerrit: promote stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0) |
21:07:00 
== DEFAULT ==
:* gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud
21:07:02 gerrit/gerrit: finalize stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0) |
21:07:02 Finished deploy [gerrit/gerrit@967b0d7]: (no justification provided) (duration: 00m 34s)
21:07:02 Finished deploy [gerrit/gerrit@967b0d7] (duration: 00m 34s)

$GERRIT_SITE/plugins is still a symbolic link to /srv/deployment/gerrit/... and they are thus all owned by gerrit-deploy.

I have added a few sudo rules to orchestrate puppet run, stopping gerrit, running gerrit.war init.

Progress!

Change 844998 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: change scap user to gerrit-deploy

https://gerrit.wikimedia.org/r/844998

Change 831093 abandoned by Hashar:

[operations/software/gerrit@deploy/wmf/stable-3.4] scap: automatize plugins handling

Reason:

We have since upgraded to Gerrit 3.5 and the bundled plugins are now deployed by scap (done as part of migrating from git-fat to git-lfs) so we don't need the `--install-all-plugins` any more.

I will revisit this change later on, maybe after Gerrit 3.6 upgrade.

https://gerrit.wikimedia.org/r/831093

Change 831916 abandoned by Hashar:

[operations/puppet@production] gerrit: scap checks script to automatize deployment

Reason:

will revisit later

https://gerrit.wikimedia.org/r/831916

Change 844523 abandoned by Hashar:

[operations/puppet@production] gerrit: sudo rules for scap deployment

Reason:

will revisit later

https://gerrit.wikimedia.org/r/844523

Change 844998 merged by Jbond:

[operations/puppet@production] gerrit: change scap user to gerrit-deploy

https://gerrit.wikimedia.org/r/844998

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:02:09Z] <hashar> Disabled Puppet agent on gerrit1003 and gerrit2002 to roll https://gerrit.wikimedia.org/r/844998 which requires some manual steps | T317412

The aftermath of running https://gerrit.wikimedia.org/r/c/operations/puppet/+/844998/ on gerrit2002:

Notice: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/gerrit2.d]: Not removing directory; use 'force' to override
Notice: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/gerrit2.d]/ensure: removed
Notice: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/gerrit2.d/gerrit-scap]/ensure: removed
Notice: /Stage[main]/Gerrit/Ssh::Userkey[gerrit2-scap]/File[/etc/ssh/userkeys/gerrit-deploy.d/]/ensure: created
Notice: /Stage[main]/Gerrit/Ssh::Userkey[gerrit2-scap]/File[/etc/ssh/userkeys/gerrit-deploy.d/gerrit-scap]/ensure: defined content as '{sha256}640784c718cf3aa28a79e4ea71cbeface894f62587e667b8f78c0b883274590b'
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/Group[gerrit-deploy]/ensure: created
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/User[gerrit-deploy]/ensure: created
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/File[/var/lib/gerrit-deploy]/ensure: created
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/Package[gerrit/gerrit]/ensure: created (corrective)
Notice: /Stage[main]/Gerrit/Scap::Target[gervert/deploy]/Package[gervert/deploy]/ensure: created (corrective)
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/Ssh::Userkey[gerrit-deploy]/File[/etc/ssh/userkeys/gerrit-deploy]/ensure: defined content as '{sha256}640784c718cf3aa28a79e4ea71cbeface894f62587e667b8f78c0b883274590b'
Notice: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/Sudo::User[scap_gerrit-deploy]/File[/etc/sudoers.d/scap_gerrit-deploy]/ensure: defined content as '{sha256}76bd5fc3eec791e9fb77b17d953a84e997dd26b754d66718ef00e29238d5b208'

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:22:27Z] <hashar@deploy2002> Started deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:22:42Z] <hashar@deploy2002> Finished deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 15s)

Change 978505 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.5] scap: change deploy user from gerrit2 to gerrit-deploy

https://gerrit.wikimedia.org/r/978505

Change 978505 merged by jenkins-bot:

[operations/software/gerrit@deploy/wmf/stable-3.5] scap: change deploy user from gerrit2 to gerrit-deploy

https://gerrit.wikimedia.org/r/978505

Change 978527 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit/tools/gervert/deploy@master] Change deploy user from gerrit2 to gerrit-deploy

https://gerrit.wikimedia.org/r/978527

Change 978527 merged by Hashar:

[operations/software/gerrit/tools/gervert/deploy@master] Change deploy user from gerrit2 to gerrit-deploy

https://gerrit.wikimedia.org/r/978527

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:35:02Z] <hashar@deploy2002> Started deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:35:14Z] <hashar@deploy2002> Finished deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 12s)

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:35:55Z] <hashar@deploy2002> Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:36:04Z] <hashar@deploy2002> Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 06s)

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:43:54Z] <hashar@deploy2002> Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:44:01Z] <hashar@deploy2002> Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 07s)

Manual steps I took after rolling https://gerrit.wikimedia.org/r/c/operations/puppet/+/844998/ and mentioned in the commit message:

Remove empty directory /etc/ssh/userkeys/gerrit2.d

Change the Unix group for /srv/deployment/gerrit/* and /srv/deployment/gervert/* from gerrit2 to gerrit-deploy. The reason is scap::target only changes the user.

I have restarted Gerrit on gerrit2002 and it seems to be working.

I then had to change the scap ssh user name in the repos scap/scap.cfg files.

Mentioned in SAL (#wikimedia-operations) [2024-10-08T21:34:38Z] <mutante> gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412