Page MenuHomePhabricator

rack/setup/install cumin1001.eqiad.wmnet (new cumin master)
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of the new cumin master for eqiad, ordered to replace outdated hardware system neodymium.

Hostname Proposal: If we cannot come up with a decent cluster type name for this, we'll have to re-use old (no longer used) element names. As it is the role cluster::management, how about just clustermgmt1001?

Racking Proposal: Any 1G rack, this is replacing (not running alongside) neodymium, so its location in regards to it is irrelevant.

clustermgmt1001:

  • - receive in system on procurement task T195418
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Related Objects

StatusSubtypeAssignedTask
ResolvedVolans

Event Timeline

RobH triaged this task as Medium priority.Aug 6 2018, 6:58 PM
RobH created this task.

I think there was an agreement to install this a Stretch and perform this way the upgrade jessie->stretch of this cluster.
In order to do that we need to build some debian packages for stretch that right now are jessie only (cumin come to mind). It shouldn't be a problem but just mentioning to not forget.

About the naming given that it's a host used fairly often I would go with something quick and easy to type, unless everyone has some sort of autocompletion (for example using my script, that I need to put in a more official place). If we assume autocompletion cluster* looks good to me as we don't have any other host starting with clu and the autocompletion will be easy.
If we don't assume autocompletion than we might try to find a shorter name.

Let's simply call these cumin*, it's the primary service offered and the other bits on these hosts (like debdeploy) are also tied to it.

I think there was an agreement to install this a Stretch and perform this way the upgrade jessie->stretch of this cluster.

Ack, that was the plan.

Cmjohnson moved this task from Racking Tasks to Blocked on the ops-eqiad board.

@RobH these are ready but I see there is another last minute name change. When you do production DNS can you update mgmt dns to cumin1001 please.

I'm taking over this to use it for a live-session with the new hires. I'll take care of fixing the mgmt interface beforehand.

Change 461919 had a related patch set uploaded (by Volans; owner: Volans):
[operations/dns@master] Rename clustermgmt to cumin

https://gerrit.wikimedia.org/r/461919

Change 461919 merged by Volans:
[operations/dns@master] Rename clustermgmt to cumin

https://gerrit.wikimedia.org/r/461919

Change 462274 had a related patch set uploaded (by Volans; owner: Volans):
[operations/dns@master] Add cumin1001 IPs and PTRs

https://gerrit.wikimedia.org/r/462274

Change 462278 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] cumin: installation of cumin1001

https://gerrit.wikimedia.org/r/462278

Change 462274 merged by Volans:
[operations/dns@master] Add cumin1001 IPs and PTRs

https://gerrit.wikimedia.org/r/462274

Change 462278 merged by Volans:
[operations/puppet@production] cumin: installation of cumin1001

https://gerrit.wikimedia.org/r/462278

Script wmf-auto-reimage was launched by volans on cumin2001.codfw.wmnet for hosts:

cumin1001.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201809251732_volans_32407_cumin1001_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cumin1001.eqiad.wmnet']

and were ALL successful.

Host installed, keyholder armed and a quick test with cumin worked as expected.

@MoritzMuehlenhoff we need to do some additional test to check everything is working but most likely it's all good

@Cmjohnson I think the physical label need to be updated from 'clustermgmt1001' to 'cumin1001', correct me if I'm wrong.

I've created T205513 for the router changes and made some tests with Cumin and debdeploy, looked all fine. Going to update my patch for the mysql grants to also cover cumin1001.

Volans removed a project: Patch-For-Review.
faidon renamed this task from rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) to rack/setup/install cumin1001.eqiad.wmnet (new cumin master).Apr 19 2019, 11:52 AM