Page MenuHomePhabricator

rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves)
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of the 2 dedicated dump slaves

db1139:
Rack location: Anywhere on row B

  • - receive in system on procurement task T214066
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - RAID10 stripsize 256kb
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

db1140:
Rack location: Anywhere but not on the following racks: C3,D8,A6, and not on the same rack of db1139

  • - receive in system on procurement task T214066
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - RAID10 stripsize 256kb
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

Event Timeline

Marostegui changed the task status from Open to Stalled.Mar 22 2019, 10:56 AM
Marostegui added a subtask: Unknown Object (Task).
Marostegui moved this task from Triage to Blocked external/Not db team on the DBA board.

Change 498768 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Add db1139,db1140,dbprov200*

https://gerrit.wikimedia.org/r/498768

Change 498768 merged by Marostegui:
[operations/puppet@production] install_server: Add db1139,db1140

https://gerrit.wikimedia.org/r/498768

Change 499048 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Add db1139 and db1140 as spares.

https://gerrit.wikimedia.org/r/499048

Change 499048 merged by Marostegui:
[operations/puppet@production] site.pp: Add db1139 and db1140 as spares.

https://gerrit.wikimedia.org/r/499048

Cmjohnson closed subtask Unknown Object (Task) as Resolved.Apr 8 2019, 3:42 PM

Those two hosts were added to site.pp and have been added to the install recipe a few days ago.

Marostegui mentioned this in Unknown Object (Task).Apr 8 2019, 3:46 PM
Marostegui changed the task status from Stalled to Open.Apr 9 2019, 5:34 AM

Opening as the hosts arrived

Change 504414 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for db1139/1140

https://gerrit.wikimedia.org/r/504414

Change 504414 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for db1139/1140

https://gerrit.wikimedia.org/r/504414

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Cmjohnson.

@Marostegui @RobH these are racked and all on-site work is completed.

So from DC Ops side only missing the production DNS entries?
Thanks Chris!

Change 504982 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] db1139 and db1140 production dns

https://gerrit.wikimedia.org/r/504982

RobH updated the task description. (Show Details)

Change 504982 merged by RobH:
[operations/dns@master] db1139 and db1140 production dns

https://gerrit.wikimedia.org/r/504982

RobH removed RobH as the assignee of this task.Apr 29 2019, 4:56 PM

IRC Update: This is ready for installation by the DBA team, one of them will steal this task later this week.

Change 507764 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Prepare db1139 and db1140 for reimage

https://gerrit.wikimedia.org/r/507764

Change 507764 merged by Jcrespo:
[operations/puppet@production] mariadb: Prepare db1139 and db1140 for reimage

https://gerrit.wikimedia.org/r/507764

Either dns, remote ipmi or password may not be configured properly:

Error: Unable to establish IPMI v2 / RMCP+ session
11:23:36 | Unable to run wmf-auto-reimage: Remote IPMI failed for mgmt 'db1139.mgmt.eqiad.wmnet': Command '['ipmitool', '-I', 'lanplus', '-H', 'db1139.mgmt.eqiad.wmnet', '-U', 'root', '-E', 'chassis', 'power', 'status']' returned non-zero exit status 1

trying to debug following workbook.

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['db1139.eqiad.wmnet', 'db1140.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905021213_jynus_190723.log.

@Cmjohnson In case this is useful for you, I have documented how to enable ipmi on ilo5 from the web interface here: https://wikitech.wikimedia.org/w/index.php?title=Management_Interfaces&diff=1824940&oldid=1823217

Completed auto-reimage of hosts:

['db1140.eqiad.wmnet', 'db1139.eqiad.wmnet']

and were ALL successful.

jcrespo reassigned this task from jcrespo to Cmjohnson.
jcrespo updated the task description. (Show Details)

installed, implementation (provisioning) will be handled at T220572.