Page MenuHomePhabricator

rack/setup/deploy ms-be202[2-7]
Closed, ResolvedPublic

Description

This task will track the planning for the racking and then implementation of the new swift backends ordered for codfw.

ms-be2002[2-7] were ordered on T130713.

We'll need to plan out where these 6 new systems will rack, and if they need to be in wholly different racks and rows. Since we have 4 rows, but 6 systems, some will end up sharing a row.

ms-be2022

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d2-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

ms-be2023

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d2-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

ms-be2024

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d2-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

ms-be2025

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d7-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

ms-be2026

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d7-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

ms-be2027

  • - receive in normally via T130713 (includes racktables update)
  • - apply hostname label, update racktables
  • - rack in d7-codfw
  • - setup switch port (description, enable, vlan)
  • - add mgmt (hostname & asset tag) and production (hostname) dns entries
  • - update bios settings and ilom settings
  • - update install_server module
  • - install OS
  • - accept/sign puppet/salt keys
  • - service implementation

Event Timeline

RobH created this task.May 31 2016, 5:56 PM
Restricted Application added subscribers: Zppix, Southparkfan. · View Herald TranscriptMay 31 2016, 5:56 PM
RobH reassigned this task from RobH to fgiunchedi.EditedMay 31 2016, 5:57 PM
RobH added a subscriber: Papaul.

I'm going to assign this to @fgiunchedi for his recommendation on how we need to space out the 6 new systems in the 4 rows. Once we have his feedback, @Papaul and I can work on what exact racks to place these in.

I didn't want to just read the current config, and make assumptions that these would be evenly distributed across it. Since these are newer machines, there may be plans to re-balance the backends.

RobH added a comment.May 31 2016, 6:04 PM

It seems @fgiunchedi is out from now until the 10th. These servers may arrive on site before he returns.

RobH renamed this task from rack ms-be202[2-7] to rack/setup/deploy ms-be202[2-7].May 31 2016, 6:07 PM
fgiunchedi reassigned this task from fgiunchedi to RobH.EditedJun 13 2016, 11:59 AM

thanks @RobH, please rack two systems per row where ms-be already exist, namely row A/B/C if I'm not mistaken. Spreading machines across racks in the same row is preferred but sharing the same rack with new or existing ms-be systems is acceptable too (modulo 10G ports availability)

RobH added a comment.Jun 13 2016, 6:37 PM

Is there a way we can balance things out to make use of all 4 rows? We have row D underutilized at this point.

we could do that too, namely install all 6x systems in row D and expand swift there. If row D is generally underutilized let's go with that instead, thanks!

I chat will Filippo on IRC so the final layout is to but all 6 servers in row D ( D1, D3, D4,D5, D7 and D8)

RobH added a comment.Jun 14 2016, 4:38 PM

Updated from IRC Chat:

These can all do 10GbE and won't be in the same service cluster, so they can share racks.

Please place half of these in D2, and the other half in D7.

RobH updated the task description. (Show Details)Jun 14 2016, 4:40 PM

@fgiunchedi what partman recipe do you want to use with the new systems ?
The systems have 12x3TB SATA and 2x200GB SAS

Papaul updated the task description. (Show Details)Jun 15 2016, 10:36 PM

@Papaul partman recipe would be the same as other ms-be systems from HP, namely ms-be-hp.cfg, thanks!

also just to confirm, the 2x200GB SAS is SSD not spinning disks, correct?

Change 294543 had a related patch set uploaded (by Filippo Giunchedi):
DNS: Add mgmt DNS entries for ms-be2022 to ms-be2027

https://gerrit.wikimedia.org/r/294543

@fgiunchedi Yes there are SSD

Papaul claimed this task.Jun 16 2016, 11:40 PM

@fgiunchedi the other ms-be* systems have Trusty installed are we also installing Trusty on the new systems of Jessie?

Thanks

Papaul updated the task description. (Show Details)Jun 17 2016, 12:32 AM

Change 294543 merged by Filippo Giunchedi:
DNS: Add mgmt DNS entries for ms-be2022 to ms-be2027

https://gerrit.wikimedia.org/r/294543

Change 294863 had a related patch set uploaded (by Filippo Giunchedi):
DHCP: Add MAC address entries for ms-be202[2-7]

https://gerrit.wikimedia.org/r/294863

Change 294863 merged by Filippo Giunchedi:
DHCP: Add MAC address entries for ms-be202[2-7]

https://gerrit.wikimedia.org/r/294863

Change 294866 had a related patch set uploaded (by Filippo Giunchedi):
adding install params for ms-be202[2-7]

https://gerrit.wikimedia.org/r/294866

Change 294866 merged by Filippo Giunchedi:
adding install params for ms-be202[2-7]

https://gerrit.wikimedia.org/r/294866

@fgiunchedi the other ms-be* systems have Trusty installed are we also installing Trusty on the new systems of Jessie?

no we'll go with jessie, I've merged all your patches, thanks @Papaul !

RobH added a comment.EditedJun 17 2016, 4:17 PM

So these are HP with HW raid controllers. We'll need @Papaul to setup the raid10 of the primary spinning disks for these before we can proceed with installation.

I imagine the partman/ms-be-hp.cfg referred in the past patchset just hasn't been written yet (since it doesn't appear to exist?)

@fgiunchedi Do we just want these to match the same mount/partitioning scheme as the existing ms-be2XXX systems?

Also the production dns entries were missing, I've added them via https://gerrit.wikimedia.org/r/#/c/294932/

@RobH, yeah same as other the others, namely raid configuration for ms-be is all disks in raid0 from the hw controller, similarly to https://phabricator.wikimedia.org/T116542#1768515

RobH added a comment.EditedJun 17 2016, 4:27 PM

Ok, clarifying the confusion.

The spinning disks need to be presented to linux as a individual disks. On the dell ms-be systems (with h710 controller) they have to be set as a bunch of raid0 single disk arrays.

On the HP controller, we need to determine if they can just be presented as individual (jbod) disks. Then the OS installs on the raid1 on sda/sdb for the OS.

Not sure how the SSDs should mount yet.

Servers are configured and ready for install but are not getting DHCP

Restricted Application added a subscriber: Steinsplitter. · View Herald TranscriptJun 20 2016, 7:26 PM

switch port configuration wasn't correct (ge vs xe ports names), I've fixed that and was able to pxe-boot ms-be2022

Change 295472 had a related patch set uploaded (by Filippo Giunchedi):
swift: add ms-be202[2-7]

https://gerrit.wikimedia.org/r/295472

Change 295472 merged by Filippo Giunchedi:
swift: add ms-be202[2-7]

https://gerrit.wikimedia.org/r/295472

@Papaul the two ssd were in raid1, was it the default configuration? I'm asking because in this case we need all disks in raid0, this is what I did in hpssacli to fix it

set target controller slot=3
ld all delete forced
create type=arrayr0 drivetype=ss_sata
create type=arrayr0 drivetype=sata

@fgiunchedi yes the default was raid1 i can but that in raid 0 like the other disks

fgiunchedi updated the task description. (Show Details)Jun 22 2016, 4:57 PM
Papaul reassigned this task from Papaul to fgiunchedi.Jun 22 2016, 6:31 PM
Papaul updated the task description. (Show Details)

Mentioned in SAL [2016-06-23T16:18:33Z] <godog> swift: add ms-be202[234] weight 1000 - T136630

fgiunchedi updated the task description. (Show Details)Jun 29 2016, 9:57 AM
fgiunchedi updated the task description. (Show Details)Jul 18 2016, 11:06 AM

Mentioned in SAL [2016-07-18T11:08:36Z] <godog> swift codfw-prod: ms-be202[567] weight 3000 - T136630

fgiunchedi closed this task as Resolved.Aug 1 2016, 8:55 AM

this is completed, though see also T136631 as we likely need to upgrade the controller firmware on these boxes too