install contint1002
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	RobH
	Jul 26 2022, 4:59 PM

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostnames: contint1002
Racking Proposal: Any 1G rack with public vlan, these are replacing the hosts running these services at eqiad.
Networking Setup: single 1G public vlan with IPv4/IPv6
Partitioning/Raid: 2 dev raid1
OS Distro: Buster
Sub-team Technical Contact: @LSobanski

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

contint1002:

- receive in system on procurement task T311856 & in coupa
- rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
- add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
- network port setup via netbox, run homer from an active cumin host to commit
- bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
- firmware update (idrac, bios, network, raid controller)
- operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
- OS installation & initital puppet run via sre.hosts.reimage cookbook.

Details

	Subject	Repo	Branch	Lines +/-
	Add contint1002 to site.pp and netboot.cfg	operations/puppet	production	+5 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Dzahn	T294276 contint hardware refresh
Resolved	hashar	T313832 contint1002 service implementation tracking
Resolved	Jelto	T324659 contint2002 service implementation tracking
		Unknown Object (Task)
Resolved	• Cmjohnson	T313830 Q1:rack/setup/install contint1002

Event Timeline

RobH created this task.Jul 26 2022, 4:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2022, 4:59 PM

RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.Jul 26 2022, 5:00 PM

RobH added a parent task: Unknown Object (Task).

RobH mentioned this in Unknown Object (Task).

RobH unsubscribed.Jul 26 2022, 5:03 PM

Jclark-ctr updated the task description. (Show Details)Aug 17 2022, 6:42 PM

contint1002 B1 U38 port38 cableid 23000029

Jclark-ctr updated the task description. (Show Details)Aug 26 2022, 3:19 PM

LSobanski changed the status of subtask T313832: contint1002 service implementation tracking from Open to Stalled.Sep 20 2022, 3:27 PM

Papaul removed Papaul as the assignee of this task.Sep 26 2022, 6:05 PM

Papaul edited projects, added ops-eqiad; removed ops-codfw.

Papaul subscribed.

jijiki moved this task from Incoming 🐫 to 🙈🙉🙊Backlog on the serviceops board.Sep 28 2022, 2:17 PM

wiki_willy assigned this task to • Cmjohnson.Oct 26 2022, 4:47 PM

wiki_willy moved this task from Backlog to Racking Tasks on the ops-eqiad board.

I added these to netbox but when I ran the dns script and home, nothing changed.

@Cmjohnson what's the expected ETA for this host? Asking as contint1001 seems to be nearing the end of its life and we'd like to move ahead with the replacement as quick as possible.

Dzahn added a parent task: T294276: contint hardware refresh.Nov 2 2022, 5:02 PM

Dzahn mentioned this in T313832: contint1002 service implementation tracking.

Note the contint machines require a public IPv4 address in order to be able to reach out WMCS instances. Currently we have:

fqdn	IPv4
contint1001.wikimedia.org	208.80.154.17
contint2001.wikimedia.org	208.80.153.15

Given this task to replace contint1001, its IPv4 address can be reclaimed once the migration has completed and the contint1001 host is decommissioned.

jijiki moved this task from 🙈🙉🙊Backlog to 🛠 Upgrades and Hardware on the serviceops board.Nov 7 2022, 3:46 PM

LSobanski edited projects, added collaboration-services; removed serviceops.Nov 15 2022, 11:20 AM

LSobanski moved this task from Incoming to Consultation on the collaboration-services board.Nov 15 2022, 4:13 PM

wiki_willy moved this task from Racking Tasks to Remote Work on the ops-eqiad board.Nov 22 2022, 7:53 PM

Papaul updated the task description. (Show Details)Nov 23 2022, 4:35 PM

Papaul updated the task description. (Show Details)Nov 23 2022, 4:58 PM

Change 860093 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add contint1002 to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/860093

gerritbot added a project: Patch-For-Review.Nov 23 2022, 6:54 PM

Change 860093 merged by Papaul:

[operations/puppet@production] Add contint1002 to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/860093

In T313830#8366381, @hashar wrote:

Given this task to replace contint1001, its IPv4 address can be reclaimed once the migration has completed and the contint1001 host is decommissioned.

Ah, so you are saying you don't need a new machine in eqiad in parallel while contint1001 still exists?

That means we can and should start decom'ing contint1001 .. now?

Papaul updated the task description. (Show Details)Nov 23 2022, 7:03 PM

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host contint1002.wikimedia.org with OS buster

Maintenance_bot removed a project: Patch-For-Review.Nov 23 2022, 7:30 PM

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host contint1002.wikimedia.org with OS buster completed:

contint1002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh buster OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202211231909_pt1979_401994_contint1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active

@LSobanski this is done

LSobanski changed the status of subtask T313832: contint1002 service implementation tracking from Stalled to Open.Nov 23 2022, 7:49 PM

If this is done, I assume the IP addresses can't have stayed the same as @hashar was asking. But given that netbox will assign one automatically that was probably never an option.

@Dzahn yes the server has a Public IP address

hashar added a parent task: T324659: contint2002 service implementation tracking.Dec 7 2022, 10:49 AM

hashar removed a subtask: T313832: contint1002 service implementation tracking.

hashar added a parent task: T313832: contint1002 service implementation tracking.Dec 7 2022, 10:51 AM

hashar removed a parent task: T294276: contint hardware refresh.Dec 7 2022, 10:53 AM