Page MenuHomePhabricator

bring new gitlab hardware servers into production
Open, In Progress, HighPublic

Description

So far all gitlab* machines have been virtual.

Now we just got the first dedicated hardware, physical servers for gitlab.

There is one ticket for codfw and one for eqiad, both include gitlab* and gitlab-runner* machines.

codfw - T301183
eqiad - T301177

Both are now ready for us to take over.

GitLab Runner migration

gitlab-runner hosts can be integrated independently from GitLab migration. So the following machines need role(gitlab_runner):

  • gitlab-runner1002.eqiad.wmnet (paused)
  • gitlab-runner1003.eqiad.wmnet (paused)
  • gitlab-runner1004.eqiad.wmnet (paused)
  • gitlab-runner2002.codfw.wmnet (paused)
  • gitlab-runner2003.codfw.wmnet (paused)
  • gitlab-runner2004.codfw.wmnet (paused)

If the above Runners are configured and ready, ganeti VMs gitlab-runner1001.eqiad.wmnet and gitlab-runner2001.codfw.wmnet can be unregistered and destroyed.

  • decommission gitlab-runner1001.eqiad.wmnet
  • decommission gitlab-runner2001.codfw.wmnet

GitLab migration

GitLab migration needs some additional preparation.

  • register second service IPs for gitlab1003
  • validate puppet code and GitLab configuration with a physical replica on gitlab1003 (also bullseye)
  • evaluate additional configuration changes for potential HA setups
  • create custom partman config for GitLab 793534
    • bigger / root volume
    • dedicated /srv volume (and move backups back to this folder instead of /mnt)
    • dedicated Docker volume not needed
    • dedicated Registry volume (see gitlab_rails['registry_path'])

See checklist for replica migration: T307142#7969993
See checklist for production migration: T307142#7971192

Tasks after downtime:

  • switch bacula fileset for gitlab from /mnt to /srv 800357
  • increase TTL for DNS records
  • check bacula backups for new host next day
  • migrate additional hosts
    • gitlab2002.wikimedia.org as replica
    • gitlab2003.wikimedia.org as replica
  • decommission old hosts
    • gitlab2001.wikimedia.org
    • gitlab1001.wikimedia.org
    • remove dns entries
    • remove hosts from hiera

Details

ProjectBranchLines +/-Subject
operations/dnsmaster+0 -4
operations/puppetproduction+0 -8
operations/puppetproduction+4 -9
operations/puppetproduction+2 -7
operations/puppetproduction+0 -5
operations/puppetproduction+1 -2
operations/puppetproduction+0 -6
operations/dnsmaster+0 -4
operations/puppetproduction+1 -16
operations/puppetproduction+1 -1
operations/puppetproduction+1 -2
operations/puppetproduction+0 -5
operations/puppetproduction+0 -10
operations/puppetproduction+0 -6
operations/puppetproduction+0 -6
operations/puppetproduction+20 -20
operations/puppetproduction+3 -3
operations/dnsmaster+8 -8
operations/dnsmaster+4 -4
operations/puppetproduction+5 -4
operations/dnsmaster+8 -8
operations/puppetproduction+15 -19
operations/puppetproduction+22 -20
operations/dnsmaster+2 -0
operations/puppetproduction+12 -2
operations/puppetproduction+6 -1
operations/puppetproduction+11 -0
operations/puppetproduction+9 -2
operations/puppetproduction+1 -1
operations/dnsmaster+2 -0
operations/puppetproduction+86 -1
operations/puppetproduction+7 -1
operations/dnsmaster+4 -0
operations/dnsmaster+4 -0
operations/dnsmaster+4 -0
operations/puppetproduction+1 -1
operations/puppetproduction+8 -3
operations/puppetproduction+10 -16
operations/puppetproduction+29 -1
operations/puppetproduction+3 -3
operations/puppetproduction+3 -7
operations/puppetproduction+5 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jelto updated the task description. (Show Details)

Change 800709 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: move gitlab-replica from netbox to dns repo

https://gerrit.wikimedia.org/r/800709

Change 800709 merged by Jelto:

[operations/dns@master] wikimedia.org: move gitlab-replica from netbox to dns repo

https://gerrit.wikimedia.org/r/800709

Change 800719 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: move gitlab from netbox to dns repo

https://gerrit.wikimedia.org/r/800719

Change 800719 merged by Jelto:

[operations/dns@master] wikimedia.org: move gitlab from netbox to dns repo

https://gerrit.wikimedia.org/r/800719

Change 800728 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: use gitlab1004 as replia/passive host

https://gerrit.wikimedia.org/r/800728

Change 800666 merged by Jelto:

[operations/puppet@production] idp: add gitlab-new to idp

https://gerrit.wikimedia.org/r/800666

Change 800728 merged by Jelto:

[operations/puppet@production] gitlab: use gitlab1004 as replia/passive host

https://gerrit.wikimedia.org/r/800728

Mentioned in SAL (#wikimedia-releng) [2022-05-30T11:46:13Z] <jelto> apply gitlab-settings to gitlab1003 - T307142

Mentioned in SAL (#wikimedia-releng) [2022-05-30T11:47:24Z] <jelto> apply gitlab-settings to gitlab1004 - T307142

gitlab1003 and gitlab1004 are configured as GitLab replicas now and are serving https://gitlab-replica-new.wikimedia.org/ and https://gitlab-new.wikimedia.org/.

As mentioned in T274463#7966543 pressure on backup disk is less urgent than last week. So I would like to validate the migration first for the replica. This means https://gitlab-replica.wikimedia.org/ will be migrated from gitlab2001 (the old ganeti VM) to gitlab1003 (the new physical machine).

If that worked as expected I'll continue with the migration of https://gitlab.wikimedia.org/ from gitlab1001 (the old ganeti VM) to gitlab1004 (the new physical machine). Additional maintenance announcements will me needed here.

Change 801651 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: make gitlab1003 new replica

https://gerrit.wikimedia.org/r/801651

Change 801652 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: make gitlab1003 the new gitlab-replica

https://gerrit.wikimedia.org/r/801652

Checklist for todays gitlab-replica migation from gitlab2001 to gitlab1003:

Preparations before downtime:

  • register second service IPs for gitlab1003
  • validate puppet code and GitLab configuration with a physical replica on gitlab1003 (also bullseye)
  • move gitlab-replica.wikimedia.org entry from netbox to dns repo 800709
  • apply role(gitlab) to gitlab1003 and verify installation 791599
  • copy ssh host keys for ssh-gitlab daemon from gitlab2001 to gitlab1003
  • configure gitlab1003 with profile::gitlab::service_name: 'gitlab-replica.wikimedia.org' 801651
  • configure gitlab1004 as profile::gitlab::active_host not needed on replica
  • apply gitlab-settings to gitlab1003
  • announce downtime some days ahead on ops/releng list? not needed on replica

Scheduled downtime:

  • Announce downtime in #wikimedia-gitlab not needed on replica
  • pause all GitLab Runners not needed on replica
  • stop puppet on gitlab2001 with sudo disable-puppet "Full backup - T307142"
  • stop GitLab on gitlab2001 with gitlab-ctl stop
  • create full backup on gitlab1001/gitlab2001 with /usr/bin/gitlab-backup create CRON=1 STRATEGY=copy GZIP_RSYNCABLE="true" GITLAB_BACKUP_MAX_CONCURRENCY="4" GITLAB_BACKUP_MAX_STORAGE_CONCURRENCY="1" && ls -t "/mnt/gitlab-backup"/*gitlab_backup.tar | head -n1 | xargs -i cp {} "/mnt/gitlab-backup"/latest/latest-data.tar
  • sync backup, on gitlab1001 run /usr/bin/rsync -avp /mnt/gitlab-backup/latest/ rsync://gitlab1003.wikimedia.org/data-backup
  • trigger restore on gitlab1003 run /srv/gitlab-backup/gitlab-restore.sh
  • overwrite home_page_url. on gitlab1003 run echo "ApplicationSetting.last.update(home_page_url: 'https://gitlab-replica.wikimedia.org/explore')" | /usr/bin/gitlab-rails console
  • Point DNS entry for gitlab-replica.wikimedia.org to gitlab1003 801652 and run authdns-update
  • verify installation
  • enable puppet on gitlab2001 with sudo enable-puppet "Full backup - T307142"
  • unpause all GitLab Runners not needed on replica
  • announce end of downtime not needed on replica

Change 801651 merged by Jelto:

[operations/puppet@production] gitlab: make gitlab1003 new replica

https://gerrit.wikimedia.org/r/801651

Change 801652 merged by Jelto:

[operations/dns@master] wikimedia.org: make gitlab1003 the new gitlab-replica

https://gerrit.wikimedia.org/r/801652

Checklist for gitlab migation from gitlab1001 to gitlab1004:

Preparations before downtime:

  • register second service IPs for gitlab1004
  • validate puppet code and GitLab configuration with a physical replica on gitlab1004 (also bullseye)
  • move gitlab.wikimedia.org entry from netbox to dns repo 800709
  • apply role(gitlab) to gitlab1004 and verify installation 800728
  • copy ssh host keys for ssh-gitlab daemon from gitlab1001 to gitlab1004
  • apply gitlab-settings to gitlab1004
  • lower TTL for gitlab.wikimedia.org? 802090
  • announce downtime some days ahead on ops/releng list? @brennen is doing that

Scheduled downtime:

  • Announce downtime in #wikimedia-gitlab
  • pause all GitLab Runners
  • stop puppet on gitlab1001 with sudo disable-puppet "Full backup - T307142"
  • stop write access on nginx and ssh-gitlab on gitlab1001 with gitlab-ctl stop nginx and systemctl stop ssh-gitlab
  • create full backup on gitlab1001 with /usr/bin/gitlab-backup create CRON=1 STRATEGY=copy GZIP_RSYNCABLE="true" GITLAB_BACKUP_MAX_CONCURRENCY="4" GITLAB_BACKUP_MAX_STORAGE_CONCURRENCY="1" && ls -t "/mnt/gitlab-backup"/*gitlab_backup.tar | head -n1 | xargs -i cp {} "/mnt/gitlab-backup"/latest/latest-data.tar
  • sync backup, on gitlab1001 run /usr/bin/rsync -avp /mnt/gitlab-backup/latest/ rsync://gitlab1004.wikimedia.org/data-backup
  • configure gitlab1004 with profile::gitlab::service_name: 'gitlab.wikimedia.org' 802150
  • configure gitlab1004 as profile::gitlab::active_host 802150
  • trigger restore on gitlab1004 run /srv/gitlab-backup/gitlab-restore.sh
  • overwrite home_page_url. on gitlab1004 run echo "ApplicationSetting.last.update(home_page_url: 'https://gitlab.wikimedia.org/explore')" | /usr/bin/gitlab-rails console
  • Point DNS entry for gitlab.wikimedia.org to gitlab1004 802473 and run authdns-update
  • verify installation
  • run puppet on gitlab1004
  • enable puppet on gitlab1001 with sudo enable-puppet "Full backup - T307142"
  • unpause all GitLab Runners
  • announce end of downtime

Change 802090 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: reduce TTL for gitlab A and AAAA to 5m

https://gerrit.wikimedia.org/r/802090

Change 802150 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: make gitlab1004 new production instance

https://gerrit.wikimedia.org/r/802150

Change 802090 merged by Jelto:

[operations/dns@master] wikimedia.org: reduce TTL for gitlab A and AAAA to 5m

https://gerrit.wikimedia.org/r/802090

Change 802473 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: make gitlab1004 the new gitlab production host

https://gerrit.wikimedia.org/r/802473

Mentioned in SAL (#wikimedia-operations) [2022-06-02T15:06:14Z] <jelto> start migration to gitlab1004 - T307142

Change 802150 merged by Jelto:

[operations/puppet@production] gitlab: make gitlab1004 new production instance

https://gerrit.wikimedia.org/r/802150

Change 802473 merged by Jelto:

[operations/dns@master] wikimedia.org: make gitlab1004 the new gitlab production host

https://gerrit.wikimedia.org/r/802473

Migration of production GitLab from gitlab1001 to gitlab1004 was successful. Downtime was around 65 minutes.

Backups were configured for the new (old) path /srv/gitlab-backup/ and a manual backup run on gitlab1004 was successful.

I'll check bacula tomorrow for new backups of gitlab1004 and make sure the replicas synced properly.

...
I'll check bacula tomorrow for new backups of gitlab1004 and make sure the replicas synced properly.

Backups for the new production host gitlab1004 look as expected. Backups also appear on Bacula:

Select the Client (1-244): 103
Automatically selected FileSet: gitlab
+---------+-------+----------+----------------+---------------------+----------------+
| JobId   | Level | JobFiles | JobBytes       | StartTime           | VolumeName     |
+---------+-------+----------+----------------+---------------------+----------------+
| 447,261 | F     |        6 | 14,754,287,536 | 2022-06-03 04:58:17 | production0081 |
+---------+-------+----------+----------------+---------------------+----------------+

Restore from Bacula to gitlab1004 worked and the files are similar to todays backup.

Replica restore also worked. One rsync job failed on the old production host gitlab1001. I'll take a closer look. I assume that's a timer/resource which is not managed by puppet anymore.

...
One rsync job failed on the old production host gitlab1001. I'll take a closer look. I assume that's a timer/resource which is not managed by puppet anymore.

Two rsync jobs were not removed by puppet when switching from active hosts to replica. This jobs failed because gitlab1001, now a replica is not allowed to write to gitlab1004.
I removed them manually. I'll try to implement a proper fix for that by using the correct combination of ensure flags in puppet (I guess in gitlab::rsync).

Change 802821 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] Bug: T307142 Change-Id: I13b8b25f66cf7c384ce464bbbeb9b7a3a7dc3861

https://gerrit.wikimedia.org/r/802821

Change 802821 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] logtash: replace gitlab1001 with gitlab1004 in tests

https://gerrit.wikimedia.org/r/802821

Change 802822 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab/acme_chief: remove gitlab1001 from list of (passive) hosts

https://gerrit.wikimedia.org/r/802822

Change 802824 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] DHCP: remove gitlab1001

https://gerrit.wikimedia.org/r/802824

Change 802846 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: remove gitlab1001, adjust gitlab machine descriptions

https://gerrit.wikimedia.org/r/802846

Change 802821 merged by Dzahn:

[operations/puppet@production] logtash: replace gitlab1001 with gitlab1004 in tests

https://gerrit.wikimedia.org/r/802821

Change 806273 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] install_server: remove gitlab-runner1001

https://gerrit.wikimedia.org/r/806273

Change 806273 merged by Dzahn:

[operations/puppet@production] install_server: remove gitlab-runner1001

https://gerrit.wikimedia.org/r/806273

Change 806276 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] install_server: remove gitlab-runner 2001

https://gerrit.wikimedia.org/r/806276

Change 806276 merged by AOkoth:

[operations/puppet@production] install_server: remove gitlab-runner 2001

https://gerrit.wikimedia.org/r/806276

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: gitlab-runner1001.eqiad.wmnet

  • gitlab-runner1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox

Change 806279 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] site: remove old gitlab runners

https://gerrit.wikimedia.org/r/806279

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: gitlab-runner1001.eqiad.wmnet

  • gitlab-runner1001.eqiad.wmnet (FAIL)
    • Host steps raised exception:

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by aokoth@cumin1001 for hosts: gitlab-runner2001.codfw.wmnet

  • gitlab-runner2001.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox

Change 806279 merged by AOkoth:

[operations/puppet@production] site: remove old gitlab runners

https://gerrit.wikimedia.org/r/806279

@Arnoldokoth @Dzahn what's missing to also check

  • decommission gitlab-runner1001.eqiad.wmnet
  • decommission gitlab-runner2001.codfw.wmnet

in the task description?

Change 806862 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] DHCP: remove gitlab2001

https://gerrit.wikimedia.org/r/806862

Change 806863 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab/acme_chief: remove gitlab2001 from list of (passive) hosts

https://gerrit.wikimedia.org/r/806863

Change 806864 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] site: remove gitlab2001

https://gerrit.wikimedia.org/r/806864

Change 806863 merged by AOkoth:

[operations/puppet@production] gitlab/acme_chief: remove gitlab2001 from list of (passive) hosts

https://gerrit.wikimedia.org/r/806863

Change 806862 merged by AOkoth:

[operations/puppet@production] DHCP: remove gitlab2001

https://gerrit.wikimedia.org/r/806862

cookbooks.sre.hosts.decommission executed by aokoth@cumin1001 for hosts: gitlab2001.wikimedia.org

  • gitlab2001.wikimedia.org (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Host steps raised exception: Cannot find cluster row_D (expected ('ganeti01.svc.eqiad.wmnet', 'ganeti01.svc.codfw.wmnet', 'ganeti01.svc.esams.wmnet', 'ganeti01.svc.ulsfo.wmnet', 'ganeti01.svc.eqsin.wmnet', 'ganeti-test01.svc.codfw.wmnet', 'ganeti01.svc.drmrs.wmnet', 'ganeti02.svc.drmrs.wmnet')).

ERROR: some step on some host failed, check the bolded items above

gitlab2001 has been removed from the acme_chief yaml, that allowed it to request certs but it's still up and has the puppet role applied.

This means puppet is broken there with "Acme_chief::Cert[gitlab]/File[/etc/acmecerts/gitlab]: Could not evaluate: Could not retrieve file metadata for puppet://acmechief1001.eqiad.wmnet/acmedata/gitlab: Error 403 on SERVER: access denied"

Change 811351 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: switch gitlab2001 back to "insetup" role

https://gerrit.wikimedia.org/r/811351

Change 811351 merged by Dzahn:

[operations/puppet@production] gitlab: switch gitlab2001 back to "insetup" role

https://gerrit.wikimedia.org/r/811351

https://gitlab-replica.wikimedia.org/ will be migrated from gitlab2001 (the old ganeti VM) to gitlab1003 (the new physical machine).

Just noticed in netbox that is still 208.80.153.105 (which is now gitlab-replica-old). I removed the gitlab prod role from gitlab2001 though and confirmed 208.80.153.105 is indeed on gitlab1003 and still up and running.

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: gitlab2001.codfw.wmnet

  • gitlab2001.codfw.wmnet (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • Failed to shutdown VM, manually run gnt-instance remove on the Ganeti master for the codfw cluster: Cumin execution failed (exit_code=2)
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • Failed to remove VM, manually run gnt-instance remove on the Ganeti master for the codfw cluster: Cumin execution failed (exit_code=2)
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: gitlab2001.wikimedia.org

  • gitlab2001.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Change 811362 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/hiera: remove gitlab2001 after decom

https://gerrit.wikimedia.org/r/811362

Mentioned in SAL (#wikimedia-operations) [2022-07-06T01:21:25Z] <mutante> gitlab1004 rm /lib/systemd/system/rsync-data-backup-gitlab2001.wikimedia.org.* ; systemctl reset-failed (T274463, T307142) - fix icinga alert after gitlab2001 was decom'ed, we didn't have puppet remove the timer/service

Change 811362 merged by Jelto:

[operations/puppet@production] site/hiera: remove gitlab2001 after decom

https://gerrit.wikimedia.org/r/811362

Change 811674 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: remove gitlab-replica-old.wikimedia.org

https://gerrit.wikimedia.org/r/811674

Change 811674 merged by Jelto:

[operations/dns@master] wikimedia.org: remove gitlab-replica-old.wikimedia.org

https://gerrit.wikimedia.org/r/811674

Change 806864 abandoned by Jelto:

[operations/puppet@production] site: remove gitlab2001

Reason:

duplicate with I7f409741d59e0a951f6f24470cb5d309bb7d2e6d

https://gerrit.wikimedia.org/r/806864

Change 802822 merged by Dzahn:

[operations/puppet@production] gitlab/acme_chief: remove gitlab1001 from list of (passive) hosts

https://gerrit.wikimedia.org/r/802822

Mentioned in SAL (#wikimedia-operations) [2022-07-06T23:00:01Z] <mutante> gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab1001* T307142

Change 811782 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/gitlab: remove gitlab1001, update comments

https://gerrit.wikimedia.org/r/811782

Change 802824 merged by Dzahn:

[operations/puppet@production] DHCP: remove gitlab1001

https://gerrit.wikimedia.org/r/802824

Change 802846 abandoned by Dzahn:

[operations/puppet@production] site: remove gitlab1001, adjust gitlab machine descriptions

Reason:

duplicate of https://gerrit.wikimedia.org/r/c/operations/puppet/+/811782

https://gerrit.wikimedia.org/r/802846

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: gitlab1001.wikimedia.org

  • gitlab1001.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Change 811782 merged by Dzahn:

[operations/puppet@production] site/gitlab: remove gitlab1001, update comments

https://gerrit.wikimedia.org/r/811782

Change 811797 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: remove gitlab1001 from Hiera

https://gerrit.wikimedia.org/r/811797

Change 811797 merged by Dzahn:

[operations/puppet@production] gitlab: remove gitlab1001 from Hiera

https://gerrit.wikimedia.org/r/811797

Change 811912 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia.org: remove gitlab-old.wikimedia.org

https://gerrit.wikimedia.org/r/811912