Page MenuHomePhabricator

rack/setup/install wdqs100[45].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking and setup of wdqs100[45].eqiad.wmnet.

Racking Proposal: These shouldn't be in any racks with existing wdqs systems (if possible), so do NOT place in a4 (wdqs1001). Place in any other 1GbE rack, since these use internal subnets. No two wdqs systems should share a rack, and try to put the three new ones in rows B, C, and D.

Please note that it is likely that d3 (wdqs1001), c7(wdqs1002) will be decommissioned after these are fully online. So if there isn't 1GbE rack space outside of D3 or c7, those two racks can be used. If there is space outside them, place these systems elsewhere to eliminate potential redundancy issues until the old ones are decommissioned.

wdqs1004:

  • - receive in system on procurement task T166780
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal vlan)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

wdqs1005:

  • - receive in system on procurement task T166780
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal vlan)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Event Timeline

RobH updated the task description. (Show Details)

@Cmjohnson now that I am back, do you need anything from me to move forward on this?

@Gehel no, not right now. Once they're racked and installed they will be turned over. They're in the queue...thanks!

Cmjohnson updated the task description. (Show Details)
Cmjohnson moved this task from High Priority Task to Blocked on the ops-eqiad board.

Assigning to @RobH for installs **Network ports are disabled.

So wdqs1004 shows a link, and I'll proceed to install it. I can login to mgmt on wdqs1005, but its actual network port shows link down:

Interface       Admin Link Description
ge-3/0/3        up    down wdqs1005

@Cmjohnson: I'm assigning this back to you for you to investigate the link of wdqs1005. Once it is working on that port, please assign this back to me. I'll continue to work on the installation of wdqs1004 and will continue on wdqs1005 after network link repair. Thanks!

Change 370050 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] wdqs100[45] install params

https://gerrit.wikimedia.org/r/370050

Change 370050 merged by RobH:
[operations/puppet@production] wdqs100[45] install params

https://gerrit.wikimedia.org/r/370050

RobH removed projects: Patch-For-Review, ops-eqiad.
RobH updated the task description. (Show Details)

I've assigned this to @Gehel as both hosts are now online and ready for service implementation.

Change 376025 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs - activate wdqs100[45] as wdqs nodes

https://gerrit.wikimedia.org/r/376025

Change 376025 merged by Gehel:
[operations/puppet@production] wdqs - activate wdqs100[45] as wdqs nodes

https://gerrit.wikimedia.org/r/376025

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061211_gehel_27913.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061212_gehel_28677.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs1005.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061213_gehel_29639.log.

Completed auto-reimage of hosts:

['wdqs1004.eqiad.wmnet']

and were ALL successful.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs1005.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061246_gehel_24529.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs1005.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061331_gehel_32003.log.

Completed auto-reimage of hosts:

['wdqs1005.eqiad.wmnet']

Of which those FAILED:

set(['wdqs1005.eqiad.wmnet'])

initial data import is done, wdqs100[45] can now be pooled.

Change 377305 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/deploy@master] adding wdqs100[45] to the list of nodes in eqiad

https://gerrit.wikimedia.org/r/377305

Change 377305 merged by Smalyshev:
[wikidata/query/deploy@master] adding wdqs100[45] to the list of nodes in eqiad

https://gerrit.wikimedia.org/r/377305

Server are installed, pooled and are serving user traffic. We can close this task.