Page MenuHomePhabricator

rack/setup/install ores1001-1009
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of 9 systems: ores1001-1009. These were ordered on T161724, and originally requested by @akosiaris on T142578.

The racking locations have been assumed by @RobH, and must be verified by @akosiaris before this can be properly handled by @Cmjohnson.

Alex: Please confirm if these should be racked with horizontal spread (across racks/rows) or if there is some limitation to this service that requires a different kind of setup. I've (@RobH) have assumed that we want to spread these out as much as possible. Also please review hostname proposal of oresXXXX. If this isn't going to work, please adjust this task's description with the acceptable hostname, and update naming conventions.

Racking Proposal: There are 9 systems, place them evenly spread between racks and rows. This means 2 per row, with one row having 3 instead of 2. Please place these in 1Gbit networking racks and otherwise place where you have the most power, network, and rackspace availability per row.

ores1001

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1002

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1003

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1004

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1005

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1006

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1007

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1008

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

ores1009

  • - system received in from procurement task T161724.
  • - system racked according to racking proposal.
  • - bios/drac/serial setup/testing
  • - mgmt and production dns entries added (internal vlan)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/#/c/360876/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Event Timeline

Racking distribution sounds fine as well as naming.

Confirming racking is likely correct, since that is how we just racked all the codfw ores systems as well, so 2 per row, none in the same rack, and one row will have 3 systems.

Racked
A6 and A7
B7 and B8
C3 and C4
D3, D4 and D6

It looks like these servers are being installed in CODFW but the named include numbers in the 1000s. It seems common to have servers in CODFW have names in the 2000s. Shouldn't these machines be named ores2001-2009?

@Halfak, these 9 are being installed in eqiad.

@Halfak since Chris racked them i'm pretty sure they are physically in EQIAD, so 1xxx names would be right.

The request at T142578 says:

The Site/Location: EQIAD + CODFW
Number of systems: 9 per DC, 18 in total

So i think these are the 9 for EQIAD and there will just be another task for another 9 in CODFW.

Aha! My mistake! Thanks for the clarification.

/me goes to fix related tasks where he wrote "codfw". :)

port assignments

1001 A6 ge-6/0/10
1002 A7 ge-7/0/23
1003 b7 ge-7/0/10
1004 B8 ge-8/0/10
1005 C3 ge-3/0/15
1006 C4 ge-4/0/13
1007 D3 ge-3/0/36
1008 D4 ge-4/0/33
1009 D6 ge-6/0/0

Change 360880 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting production dns for ores100[1-9]

https://gerrit.wikimedia.org/r/360880

Change 360880 merged by RobH:
[operations/dns@master] setting production dns for ores100[1-9]

https://gerrit.wikimedia.org/r/360880

Chris has done all the on-site required steps, stealing for remote accessible steps/remainder.

Change 360890 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting install params for ores100[1-9].eqiad.wmnet

https://gerrit.wikimedia.org/r/360890

Change 360890 merged by RobH:
[operations/puppet@production] setting install params for ores100[1-9].eqiad.wmnet

https://gerrit.wikimedia.org/r/360890

ores100[789] are not showing active links on the switch.

ge-3/0/36 up down ores1007
ge-4/0/33 up down ores1008
ge-6/0/0 ores1009

The vlan is set in the config for ge-6/0/0, but it doesnt show up or down from show interface descriptions. Not sure why its not working, will have to followup.

ores100[789] are not showing active links on the switch.

ge-3/0/36 up down ores1007
ge-4/0/33 up down ores1008
ge-6/0/0 ores1009

The vlan is set in the config for ge-6/0/0, but it doesnt show up or down from show interface descriptions. Not sure why its not working, will have to followup.

Chris pointed out row D still has old asw-d and new asw2-d, and I had hopped on the wrong stack! systems are booting into the installer.

RobH removed projects: Patch-For-Review, ops-eqiad.

All of these systems are now calling into puppet, and are ready for service implementation. The original hardware-requests was filied by Alex, so I've assigned this to him for followup.

More than likely, but I didn't want to assume. If that task does indeed handle it, and everyone involved is aware these servers are ready, this task can be resolved.

akosiaris updated the task description. (Show Details)
akosiaris updated the task description. (Show Details)

Per @Ladsgroup 's comment we better handle the service implementation in T168073. Which is btw gonna be stalled as we are going to stress test a bit the hardware first and possibly tune the software to it. That's tracked in T169246. I 'll resolved this task as there is no reason to have it open