Page MenuHomePhabricator

Install additional SSDs on gitlab2003.wikimedia.org (B5)
Closed, ResolvedPublic

Description

In T331662 new disks were requested for gitlab codfw hosts (two per hosts, four in total for codfw)

The two new disks can be installed on gitlab2003 now. We would like to verify the disk partitioning layout first on gitlab2003, because this host is still insetup. So no further downtime/coordination required.

If that was successful, we can continue with eqiad gitlab hosts (passive replicas) and after that the remaining production host in codfw. I'll open followup tasks for that.

Event Timeline

Jelto mentioned this in Unknown Object (Task).Mar 28 2023, 8:42 AM

@Jelto the 2 disks are in place in gitlab2003

Jelto triaged this task as Medium priority.Mar 31 2023, 7:43 AM

Thanks @Papaul for the quick installation!

I can confirm new disks are available on the host:

gitlab2003:~$ lsblk | grep disk
sda                                        8:0    0 894.3G  0 disk  
sdb                                        8:16   0 894.3G  0 disk  
sdc                                        8:32   0   1.7T  0 disk  
sdd                                        8:48   0   1.7T  0 disk

I'm going to reimage gitlab2003 to verify the new partman config. If that's successful, I'm going to close the task.

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye completed:

  • gitlab2003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303310813_jelto_513784_gitlab2003.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

The reimage happend on gitlab2003 but it seems the partman config is not producing the expected result. Root partition is bigger, but the two new disks are not in use. There is also no raid configured for those devices.

$ lsblk 
NAME                               MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                  8:0    0 894.3G  0 disk  
└─sda1                               8:1    0 894.3G  0 part  
  └─md0                              9:0    0 894.1G  0 raid1 
    ├─gitlab2003--vg-root          253:0    0 558.8G  0 lvm   /
    ├─gitlab2003--vg-srv--registry 253:1    0 139.7G  0 lvm   /srv/registry
    └─gitlab2003--vg-placeholder   253:2    0  23.3G  0 lvm   
sdb                                  8:16   0 894.3G  0 disk  
└─sdb1                               8:17   0 894.3G  0 part  
  └─md0                              9:0    0 894.1G  0 raid1 
    ├─gitlab2003--vg-root          253:0    0 558.8G  0 lvm   /
    ├─gitlab2003--vg-srv--registry 253:1    0 139.7G  0 lvm   /srv/registry
    └─gitlab2003--vg-placeholder   253:2    0  23.3G  0 lvm   
sdc                                  8:32   0   1.7T  0 disk  
sdd                                  8:48   0   1.7T  0 disk 
$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sda1[0] sdb1[1]
      937559040 blocks super 1.2 [2/2] [UU]
      [=====>...............]  resync = 25.1% (235986048/937559040) finish=566.5min speed=20638K/sec
      bitmap: 7/7 pages [28KB], 65536KB chunk

unused devices: <none>

I'll close this task, as the disks are installed and open a follow up to investigate the partman config. Thanks again @Papaul .
Once the partman issue is solved, I'll open tasks for installing the additional SSDs in the remaining gitlab instances.