Page MenuHomePhabricator

Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of gitlab100[2|3] and gitlab-runner100[2|3|4]

Hostname / Racking / Installation Details

Hostnames: gitlab100[3|4] and gitlab-runner100[2|3|4]
Racking Proposal: No more than 2 hosts per row, preferably in different racks for the same row
Networking/Subnet/VLAN/IP: 1G production network
Partitioning/Raid: RAID1
OS Distro: Bullseye

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

gitlab1003:
  • - receive in system on procurement task T297164 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
gitlab1004:
  • - receive in system on procurement task T297164 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
gitlab-runner1002:
  • - receive in system on procurement task T297164 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
gitlab-runner1003:
  • - receive in system on procurement task T297164 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
gitlab-runner1004:
  • - receive in system on procurement task T297164 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH renamed this task from (Need By: TBD) rack/setup/install gitlab100[2|3] and gitlab-runner100[2|3|4] to Q3:(Need By: TBD) rack/setup/install gitlab100[2|3] and gitlab-runner100[2|3|4].Feb 23 2022, 6:19 PM

@RobH (cc: @Jelto )

gitlab1002 has existed as a VM in the past, when contractors used it but then we deleted it again.

T274459#6877185

https://phabricator.wikimedia.org/search/query/127L7fu7aA6S/

It might be causing trouble or confusion to reuse the name. Maybe let's skip that one.

RobH added subscribers: LSobanski, Jclark-ctr.

@LSobanski: Is it ok to shift these hostnames from gitlab100[23] to gitlab100[34] due to T301177#7732970?

Please advise and assign back to me for followup.

RobH renamed this task from Q3:(Need By: TBD) rack/setup/install gitlab100[2|3] and gitlab-runner100[2|3|4] to Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4].Feb 24 2022, 4:49 PM
RobH reassigned this task from RobH to Jclark-ctr.
RobH updated the task description. (Show Details)

Fine by me.

Thanks, updated the racking details in the task description, kicking back to John for racking when they arrive.

Name Rack U Port Cableid
gitlab1003 a3 20 20 1864
gitlab1004 b1 33 31 23000018
gitlab-runner1002 b3 25 12 2615
gitlab-runner1003 c5 28 40 3324
gitlab-runner1004 d8 30. 13 3463

It's sufficient if you put the "insetup" role on this and hand it over to us. Let us apply the actual gitlab puppet roles please. Thanks!

Tyler, Brennen, added you here per our meeting today. So that you can see status of the physical host install.

Change 787051 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] install_server/gitlab: separate partman recipes for physical servers

https://gerrit.wikimedia.org/r/787051

Change 787051 merged by Dzahn:

[operations/puppet@production] install_server/gitlab: separate partman recipes for physical servers

https://gerrit.wikimedia.org/r/787051

You should be unblocked to install OS. partman recipe set to raid1-2dev.

confirming that the "gitlab" hosts should use a public IP and the "gitlab-runner" hosts should use a private IP.

confirming that the "gitlab" hosts should use a public IP and the "gitlab-runner" hosts should use a private IP.

This should be in the top racking instructions, things like this will get missed

Change 787758 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] adding gitlab and gitlab runner hosts to site.pp

https://gerrit.wikimedia.org/r/787758

Change 787758 merged by Cmjohnson:

[operations/puppet@production] adding gitlab and gitlab runner hosts to site.pp

https://gerrit.wikimedia.org/r/787758

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab-runner1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab-runner1003.eqiad.wmnet with OS bullseye

Change 787787 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] add gitlab-runner1004 to site.pp

https://gerrit.wikimedia.org/r/787787

Change 787787 merged by Cmjohnson:

[operations/puppet@production] add gitlab-runner1004 to site.pp

https://gerrit.wikimedia.org/r/787787

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab-runner1004.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab1003.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab1004.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-runner1002.eqiad.wmnet with OS bullseye completed:

  • gitlab-runner1002 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291602_cmjohnson_578038_gitlab-runner1002.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-runner1003.eqiad.wmnet with OS bullseye completed:

  • gitlab-runner1003 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291603_cmjohnson_578702_gitlab-runner1003.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

confirming that the "gitlab" hosts should use a public IP and the "gitlab-runner" hosts should use a private IP.

This should be in the top racking instructions, things like this will get missed

Yep, they DID get missed on the equivalent task in codfw. That made me put that here to save you from having to redo them.

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-runner1004.eqiad.wmnet with OS bullseye completed:

  • gitlab-runner1004 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291612_cmjohnson_587690_gitlab-runner1004.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab1003.wikimedia.org with OS bullseye completed:

  • gitlab1003 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291614_cmjohnson_589396_gitlab1003.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab1004.wikimedia.org with OS bullseye completed:

  • gitlab1004 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291618_cmjohnson_592021_gitlab1004.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Cmjohnson updated the task description. (Show Details)

@Dzahn These have all been installed and resolving the task