Page MenuHomePhabricator

setup gerrit2003 with gerrit service (gerrit on bookworm)
Closed, ResolvedPublic

Description

gerrit2003 is new hardware and on bookworm

  • prepare hiera data and puppet code to allow applying the production gerrit role without starting any services / no influence on production
  • apply the gerrit production role and check for puppet issues / missing packages etc
  • determine if this is resolved once it's a warm standby host or if we switch production to this because it's newer hardware

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -0
operations/puppetproduction+0 -41
operations/puppetproduction+0 -27
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+7 -0
operations/puppetproduction+12 -0
operations/puppetproduction+4 -0
operations/puppetproduction+8 -3
operations/puppetproduction+1 -3
operations/puppetproduction+1 -1
operations/puppetproduction+1 -3
operations/puppetproduction+17 -3
operations/puppetproduction+12 -1
operations/puppetproduction+1 -0
operations/puppetproduction+5 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+3 -0
operations/puppetproduction+17 -23
operations/puppetproduction+33 -0
operations/puppetproduction+7 -1
operations/puppetproduction+11 -0
operations/puppetproduction+4 -0
Show related patches Customize query in gerrit

Event Timeline

Change #1063893 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: (WIP) try applying gerrit role on gerrit2003

https://gerrit.wikimedia.org/r/1063893

Change #1063896 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: temp set a gerrit IP for testing

https://gerrit.wikimedia.org/r/1063896

Change #1063896 merged by Dzahn:

[operations/puppet@production] gerrit: temp set a gerrit IP for testing, gerrit2003 only

https://gerrit.wikimedia.org/r/1063896

Change #1063898 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: on gerrit2003, set firewall provider, admin groups, team owner

https://gerrit.wikimedia.org/r/1063898

Change #1063898 merged by Dzahn:

[operations/puppet@production] gerrit: on gerrit2003, set firewall provider, admin groups, team owner

https://gerrit.wikimedia.org/r/1063898

Change #1063904 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: create a temp insetup role to test java install in bookworm

https://gerrit.wikimedia.org/r/1063904

Change #1063904 merged by Dzahn:

[operations/puppet@production] gerrit: create a temp insetup role to test java install in bookworm

https://gerrit.wikimedia.org/r/1063904

Change #1064412 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: add firewall, java, scap, mail settings to Hiera for gerrit1004

https://gerrit.wikimedia.org/r/1064412

Change #1064412 merged by Dzahn:

[operations/puppet@production] gerrit: add firewall, java, scap, mail settings to Hiera for gerrit1004

https://gerrit.wikimedia.org/r/1064412

LSobanski triaged this task as Medium priority.Sep 4 2024, 9:04 AM
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

Change #1070680 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add insetup-gerrit role to gerrit2003, remove gerrit1004 hiera

https://gerrit.wikimedia.org/r/1070680

Change #1070680 merged by Dzahn:

[operations/puppet@production] site: add insetup-gerrit role to gerrit2003, remove gerrit1004 hiera

https://gerrit.wikimedia.org/r/1070680

Dzahn changed the task status from Open to In Progress.Sep 5 2024, 12:03 AM

Change #1070683 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: add backup::host, gerrit::migration etc to insetup role

https://gerrit.wikimedia.org/r/1070683

Change #1070683 merged by Dzahn:

[operations/puppet@production] gerrit: add backup::host, gerrit::migration etc to insetup role

https://gerrit.wikimedia.org/r/1070683

Change #1072323 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: add gerrit::proxy profile to insetup::gerrit role

https://gerrit.wikimedia.org/r/1072323

Change #1072323 merged by Dzahn:

[operations/puppet@production] gerrit: add gerrit::proxy profile to insetup::gerrit role

https://gerrit.wikimedia.org/r/1072323

Change #1073305 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit::proxy: files managed under /var/www/ require httpd

https://gerrit.wikimedia.org/r/1073305

Change #1073308 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit::proxy: fix link target for gerrit logo

https://gerrit.wikimedia.org/r/1073308

Change #1073308 merged by Dzahn:

[operations/puppet@production] gerrit::proxy: fix link target for gerrit logo

https://gerrit.wikimedia.org/r/1073308

Change #1073305 merged by Dzahn:

[operations/puppet@production] gerrit::proxy: ensure /var/www/ exists before files under it

https://gerrit.wikimedia.org/r/1073305

Change #1074275 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] acme_chief: authorize new machine gerrit2003 to fetch gerrit certs

https://gerrit.wikimedia.org/r/1074275

Change #1074275 merged by Dzahn:

[operations/puppet@production] acme_chief: authorize new machine gerrit2003 to fetch gerrit certs

https://gerrit.wikimedia.org/r/1074275

Change #1074477 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: include gerrit profile in insetup::gerrit for testing

https://gerrit.wikimedia.org/r/1074477

Change #1074498 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: add acme_chief snippet to gerrit-setup role

https://gerrit.wikimedia.org/r/1074498

Change #1074498 merged by Dzahn:

[operations/puppet@production] gerrit: add acme_chief to gerrit-setup role

https://gerrit.wikimedia.org/r/1074498

gerrit2003 now has a working apache-based gerrit::proxy with certs, no puppet errors and everything.

except the actual gerrit application and we avoided adding any service IP

Change #1077781 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: make it possible to not bind the service IP on a gerrit server

https://gerrit.wikimedia.org/r/1077781

Change #1077781 merged by Dzahn:

[operations/puppet@production] gerrit: make it possible to not bind the service IP on a gerrit server

https://gerrit.wikimedia.org/r/1077781

Change #1074477 merged by Dzahn:

[operations/puppet@production] gerrit: include gerrit profile in insetup::gerrit for testing

https://gerrit.wikimedia.org/r/1074477

Change #1078748 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: move passwords include from role to profile

https://gerrit.wikimedia.org/r/1078748

Change #1078748 merged by Dzahn:

[operations/puppet@production] gerrit: move passwords include from role to profile

https://gerrit.wikimedia.org/r/1078748

Change #1078752 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: sync lfs data also to new machine

https://gerrit.wikimedia.org/r/1078752

Mentioned in SAL (#wikimedia-operations) [2024-10-08T21:34:38Z] <mutante> gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412

Change #1078759 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: avoid duplicate declaration error on first setup

https://gerrit.wikimedia.org/r/1078759

Change #1078759 merged by Dzahn:

[operations/puppet@production] gerrit: avoid duplicate declaration error on first setup

https://gerrit.wikimedia.org/r/1078759

Change #1079026 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: comment out creation of site dir in migration profile

https://gerrit.wikimedia.org/r/1079026

Change #1079026 merged by Dzahn:

[operations/puppet@production] gerrit: comment out creation of site dir in migration profile

https://gerrit.wikimedia.org/r/1079026

For the first time puppet runs just fine on the new hardware now, before it is in production.

Also gerrit is deployed there already. Everything is in place minus "no service IP is bound to the NIC", we don't sync lfs-data yet and ?.

Notably this also means gerrit on bookworm seems to work. Since no more puppet issues, app deployed, same Java version.

Mentioned in SAL (#wikimedia-operations) [2024-10-10T07:32:23Z] <hashar> Stopped gerrit service on gerrit2003.codfw.wmnet since it is not starting up properly | T372804

The gerrit process on gerrit2003 does not start properly and is flapping:

Oct 10 07:29:21 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19328.
Oct 10 07:29:30 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19329.
Oct 10 07:29:38 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19330.
Oct 10 07:29:47 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19331.
Oct 10 07:29:56 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19332.
Oct 10 07:30:04 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19333.
Oct 10 07:30:13 gerrit2003 systemd[1]: gerrit.service: Scheduled restart job, restart counter is at 19334.

I tried to stop it manually, but of course Puppet bring it back up. That is causing alerts as the service goes up and down continuously. The service and its monitoring should be disabled until it s ready.

As a side track, I don't know what gerrit2003 is for. Is that a hardware refresh for gerrit2002?

Change #1079206 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] Disable gerrit monitoring on gerrit2003

https://gerrit.wikimedia.org/r/1079206

Change #1079206 abandoned by Hashar:

[operations/puppet@production] Disable gerrit monitoring on gerrit2003

Reason:

Thank you for having set the downtime!

https://gerrit.wikimedia.org/r/1079206

Change #1079206 restored by Dzahn:

[operations/puppet@production] Disable gerrit monitoring on gerrit2003

https://gerrit.wikimedia.org/r/1079206

Change #1079206 merged by Dzahn:

[operations/puppet@production] Disable gerrit monitoring on gerrit2003

https://gerrit.wikimedia.org/r/1079206

Change #1079358 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: set Hiera keys for nist_keys, nftables

https://gerrit.wikimedia.org/r/1079358

Change #1079358 merged by Dzahn:

[operations/puppet@production] gerrit: set Hiera keys for nist_keys, nftables

https://gerrit.wikimedia.org/r/1079358

Change #1079363 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit2003: move bind_serviceIP Hiera key host name level

https://gerrit.wikimedia.org/r/1079363

Change #1079363 merged by Dzahn:

[operations/puppet@production] gerrit2003: move bind_service_ip Hiera key host name level

https://gerrit.wikimedia.org/r/1079363

Change #1078752 merged by Dzahn:

[operations/puppet@production] gerrit: sync lfs data also to new machine

https://gerrit.wikimedia.org/r/1078752

Change #1063893 merged by Dzahn:

[operations/puppet@production] site: apply gerrit role on gerrit2003

https://gerrit.wikimedia.org/r/1063893

Change #1080379 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: remove Hiera keys on host gerrit2003 that are applied by role

https://gerrit.wikimedia.org/r/1080379

Change #1080379 merged by Dzahn:

[operations/puppet@production] gerrit: remove Hiera keys on host gerrit2003 that are applied by role

https://gerrit.wikimedia.org/r/1080379

Dzahn changed the task status from In Progress to Stalled.Oct 17 2024, 5:42 PM

Basically done.

All that is missing is we haven't assigned a service IP to this machine.

In puppet it is just set to not bind a service IP and the service is masked.

But it could be enabled now whenever we like. Also lfs data is already synced from prod server.

To be determined how exactly we use it next.

One way would be to assign like "gerrit-replica-b" to it.

Another would be to fail-over to this machine as the new production gerrit (since it's newer hardware and also newer OS version!) and then make the previous prod server a replica.

Change #1074488 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: delete temp gerrit setup role

https://gerrit.wikimedia.org/r/1074488

Change #1074488 merged by Dzahn:

[operations/puppet@production] gerrit: delete temp gerrit setup role

https://gerrit.wikimedia.org/r/1074488

Dzahn changed the task status from Stalled to In Progress.Oct 21 2024, 5:26 PM
Dzahn added a subscriber: Jelto.

Discussed in today's team meeting. This server is up and running with the production role on it now, so the task to setup the service on this machine is considered done.

There is just no service name name attached to it and the gerrit service is masked. Monitoring is disabled.

How to use it will be determined in the upcoming work regarding Gerrit failover strategies.

But as has been pointed out by @Jelto the current Gerrit prod server is not that old either. The current replica is older.

It might make the most sense to turn this into a new replica, leave the current prod host as is and also keep using the current old replica, to have 3 Gerrit machines at the same time.

LFS data syncing from the prod server is also already setup and happened via the puppetized timer:

root@gerrit2003:/srv/gerrit/data/lfs# du -hs .
31G
Dzahn renamed this task from setup gerrit2003 with gerrit service to setup gerrit2003 with gerrit service (gerrit on bookworm).Oct 24 2024, 12:42 AM

Change #1087967 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: add chown parameter to lfs data rsync, ensure daemon_user is used

https://gerrit.wikimedia.org/r/1087967

Change #1087967 merged by Dzahn:

[operations/puppet@production] gerrit: add chown parameter to lfs data rsync, ensure daemon_user is used

https://gerrit.wikimedia.org/r/1087967