Page MenuHomePhabricator

rack/setup/install ganeti500[123].eqsin.wmnet
Closed, ResolvedPublic0 Estimated Story Points

Description

This task will track the setup of the three new ganeti nodes for eqsin. These are identical in specifcation to ganeti400[123] which were setup on T226444.

Racking Plan: These were racked with the odd numbered in one rack, and the even in the other, per the policies for eqsin server locations.

ganeti5001:

  • - receive in system on procurement task T222284
  • - write up detailed instructions for remote hands to wire up the system (mgmt, network, power) and step by step directions for setup.
  • - coordinate with @RobH and @wiki_willy to get remote hands contractor to access systems and follow directions.
  • - review of on-site work before full implementation of server.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

ganeti5002:

  • - receive in system on procurement task T222284
  • - write up detailed instructions for remote hands to wire up the system (mgmt, network, power) and step by step directions for setup.
  • - coordinate with @RobH and @wiki_willy to get remote hands contractor to access systems and follow directions.
  • - review of on-site work before full implementation of server.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

ganeti5003:

  • - receive in system on procurement task T222284
  • - write up detailed instructions for remote hands to wire up the system (mgmt, network, power) and step by step directions for setup.
  • - coordinate with @RobH and @wiki_willy to get remote hands contractor to access systems and follow directions.
  • - review of on-site work before full implementation of server.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

RobH triaged this task as Medium priority.Jul 15 2019, 7:59 PM
RobH created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Jul 15 2019, 7:59 PM

Change 528169 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] ganeti500[123] mgmt dns

https://gerrit.wikimedia.org/r/528169

Change 528169 merged by RobH:
[operations/dns@master] ganeti500[123] mgmt dns

https://gerrit.wikimedia.org/r/528169

All remote hands setup on T229243 are done.

Change 558247 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add cluster defs for edge ganetis

https://gerrit.wikimedia.org/r/558247

Change 558247 merged by BBlack:
[operations/puppet@production] Add cluster defs for edge ganetis

https://gerrit.wikimedia.org/r/558247

Change 559298 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] ganeti500[123] disable notifications during setup

https://gerrit.wikimedia.org/r/559298

Change 559298 merged by Herron:
[operations/puppet@production] ganeti500[123] disable notifications during setup

https://gerrit.wikimedia.org/r/559298

Change 559315 had a related patch set uploaded (by Herron; owner: Herron):
[labs/private@master] add dummy esams and eqsin ganeti keys to pacify PCC

https://gerrit.wikimedia.org/r/559315

Change 559315 merged by Herron:
[labs/private@master] add dummy esams and eqsin ganeti keys to pacify PCC

https://gerrit.wikimedia.org/r/559315

Change 559330 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] add misc cluster to eqsin and ulsfo

https://gerrit.wikimedia.org/r/559330

Hey @RobH, T229243 is encouraging. How are these hosts looking now?

Change 559330 merged by Herron:
[operations/puppet@production] add misc cluster to eqsin and ulsfo

https://gerrit.wikimedia.org/r/559330

JFTR to avoid confusion: These should use Buster (we have the main Ganeti clusters on Stretch, but the new edge Ganeti setups are on Buster).

@RobH - I'm pretty sure we took care of all the onsite work via T229243 with DreamICC, but can you confirm and check off the boxes in this task, up to the current step we're on? Thanks, Willy

Please note that while these hosts are responsive to ssh on mgmt, I cannot login to them with the asset tag (per initial request) old mgmt pass, or new mgmt pass.

The new PDUs for EQSIN will be onsite soon (within a week or so), and Jin will be swapping them into place. I'm going to request he reset the password on these at that time.

If this has to go faster than that, we can file a remote hands task for Equinix.

Change 562339 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] ganeti500* setup info

https://gerrit.wikimedia.org/r/562339

Change 562339 merged by RobH:
[operations/puppet@production] ganeti500* setup info

https://gerrit.wikimedia.org/r/562339

Change 562366 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] ganeti500[123] production dns entries

https://gerrit.wikimedia.org/r/562366

Change 562366 merged by RobH:
[operations/dns@master] ganeti500[123] production dns entries

https://gerrit.wikimedia.org/r/562366

RobH removed a project: ops-eqsin.
RobH updated the task description. (Show Details)

This is now ready for service implementation.

Change 562538 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] ganeti: assign ganeti500[123] role::ganeti

https://gerrit.wikimedia.org/r/562538

Change 562538 merged by Herron:
[operations/puppet@production] ganeti: assign ganeti500[123] role::ganeti

https://gerrit.wikimedia.org/r/562538

Change 562549 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562549

Change 562549 abandoned by Herron:
dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562549

Change 562552 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562552

Change 562552 merged by Herron:
[operations/dns@master] dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562552

Change 562586 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] dns: fix typo in ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562586

Change 562586 merged by Herron:
[operations/dns@master] dns: fix typo in ganeti01.svc.eqsin.wmnet

https://gerrit.wikimedia.org/r/562586

Change 562613 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] dns: add forward/reverse records for netflow5001

https://gerrit.wikimedia.org/r/562613

Change 562613 merged by Herron:
[operations/dns@master] dns: add forward/reverse records for netflow5001

https://gerrit.wikimedia.org/r/562613

Change 562617 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] install_server: add netflow5001 dhcp entry

https://gerrit.wikimedia.org/r/562617

Change 562617 merged by Herron:
[operations/puppet@production] install_server: add netflow5001 dhcp entry

https://gerrit.wikimedia.org/r/562617

herron added a subscriber: RobH.

The eqsin ganeti cluster is now up and running, and a first VM netflow5001 has been created.

I'll kick this over to @MoritzMuehlenhoff now to give things a last review and close this out.

Change 562780 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Extend Netbox Ganeti sync for eqsin

https://gerrit.wikimedia.org/r/562780

Mentioned in SAL (#wikimedia-operations) [2020-01-08T10:08:03Z] <moritzm> enabling spec-ctr, ssbd. md-clear passthrough for new eqsin cluster T228099

Mentioned in SAL (#wikimedia-operations) [2020-01-08T11:00:16Z] <moritzm> drain ganeti5003 to test new Ganeti setup in eqsin T228099

Mentioned in SAL (#wikimedia-operations) [2020-01-08T11:07:08Z] <moritzm> test failover of Ganeti master in eqsin T228099

Change 562793 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Re-enable notifications for ganeti5*, setup is done

https://gerrit.wikimedia.org/r/562793

Change 562793 merged by Muehlenhoff:
[operations/puppet@production] Re-enable notifications for ganeti5*, setup is done

https://gerrit.wikimedia.org/r/562793

I tested a failover and an instance migration successfully. I also changed the cluster setting so that CPU vulnerability flags are passed through. Notifications in Icinga have been re-enabled. Closing!

Change 562780 merged by Muehlenhoff:
[operations/puppet@production] Extend Netbox Ganeti sync for eqsin

https://gerrit.wikimedia.org/r/562780

Mentioned in SAL (#wikimedia-operations) [2020-01-10T10:21:20Z] <moritzm> rename Ganeti group for eqsin from "default" to "row_1" T228099