Page MenuHomePhabricator

(Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of 3 new cloudwdqs systems.

Racking Setup: These will all be cloudvirt-network-restricted hosts. They must go in 1G racks in Row B.
Network Setup: (2) 1G rack connections, similar to cloudvirt hosts but with the network being 1G and hostname being cloudvirt-wdqs100x.

cloudvirt-wdqs1001:

  • - receive in system on procurement task T232663
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network ports setup (description, enable, vlan) - These hosts need both eth0 and eth1 1G ports connected as they have both host and virtual node traffic, similar to cloudvirt*.
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cloudvirt-wdqs1002:

  • - receive in system on procurement task T232663
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network ports setup (description, enable, vlan) - These hosts need both eth0 and eth1 1G ports connected as they have both host and virtual node traffic, similar to cloudvirt*.
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cloudvirt-wdqs1003:

  • - receive in system on procurement task T232663
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network ports setup (description, enable, vlan) - These hosts need both eth0 and eth1 1G ports connected as they have both host and virtual node traffic, similar to cloudvirt*.
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Details

Related Gerrit Patches:

Event Timeline

RobH triaged this task as Medium priority.Oct 16 2019, 5:39 PM
RobH created this task.
Restricted Application added a project: Operations. · View Herald TranscriptOct 16 2019, 5:39 PM
RobH added a parent task: Unknown Object (Task).EditedOct 16 2019, 5:39 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.

I'm not sure if @Andrew or @Gehel would know this, but I assigned to @Gehel

Racking Proposal: This was not answered on procurement task T232663, and needs to be known before the hardware can be properly racked. Also we need to know if this is going to be restricted to a cloud vlan (and possibly a specific row) or if they can go into any of the 4 rows in eqiad. Please advise on this task and re-assign to @Jclark-ctr for receiving.

Once we know that, @Jclark-ctr can rack these and get them online. (So please comment with instructions and reassign this to @Jclark-ctr.)

Gehel reassigned this task from Gehel to Andrew.Oct 16 2019, 5:45 PM
Gehel added a subscriber: Gehel.

There isn't really any racking constraint on my side (as a future user of those systems). We don't have availability or redundancy constraints (those are test systems), we don't care much about latency.

@Andrew might have more constraints from the WMCS side of things.

Andrew renamed this task from rack/setup/install cloudwdqs100[123].eqiad.wmnet to rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.EditedOct 16 2019, 6:11 PM

These boxes will be cloudvirts. So...

  1. they should be named cloudvirt-wdqs100x
  2. They need to be racked in row B with dual network hookups, just like cloudvirtXXXX (except with 1Gb networks instead of 10Gb)
RobH reassigned this task from Andrew to Jclark-ctr.Oct 16 2019, 6:14 PM
RobH updated the task description. (Show Details)
RobH added subscribers: Papaul, Cmjohnson.

Racking Setup: These will all be cloudvirt-network-restricted hosts. They must go in 1G racks in Row B.
Network Setup: (2) 1G rack connections, similar to cloudvirt hosts but with the network being 1G and hostname being cloudvirt-wdqs100x.

Task description updated.

@Jclark-ctr: So for these, you can just connect the eth0 and eth1 1G connections to the switch and list the ports they are connected to here on task. Then either myself, @Cmjohnson, or @Papaul can setup the network ports on the switch.

RobH updated the task description. (Show Details)Oct 16 2019, 6:16 PM
RobH updated the task description. (Show Details)Oct 16 2019, 6:41 PM
wiki_willy renamed this task from rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet to (No Need By Date Provided) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.Nov 22 2019, 8:59 PM
Jclark-ctr updated the task description. (Show Details)Dec 5 2019, 8:29 PM
Jclark-ctr updated the task description. (Show Details)Jan 24 2020, 10:24 PM

host racked netbox updated need ip addresses to continue

server rack switch ports (eth0,eth1)
cloudvirt-wdqs1001 b3 19,17
cloudvirt-wdqs1002 b5 13,20
cloudvirt-wdqs1003 b6 38,28

RobH removed a subscriber: RobH.Jan 28 2020, 12:17 AM

Change 570130 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] adding mgmt dns for cloudvirt-wdqs1001-3

https://gerrit.wikimedia.org/r/570130

Change 570130 merged by Cmjohnson:
[operations/dns@master] adding mgmt dns for cloudvirt-wdqs1001-3

https://gerrit.wikimedia.org/r/570130

@Jclark-ctr if you can do bios and idrac please. Below is the mgmt ip

cloudvirt-wdqs1001 1H IN A 10.65.3.152
cloudvirt-wdqs1002 1H IN A 10.65.3.153
cloudvirt-wdqs1003 1H IN A 10.65.3.154

Cmjohnson updated the task description. (Show Details)Feb 18 2020, 9:35 PM
Cmjohnson updated the task description. (Show Details)Feb 18 2020, 9:48 PM

Can I get an update on who's task this is now? The last comment is asking @Jclark-ctr to follow up but on IRC he said that he doesn't have access to do the remaining tasks. There's still some physical cabling left, isn't there?

Cabling is finished still being configured by Chris

wiki_willy renamed this task from (No Need By Date Provided) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet to (Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.Feb 24 2020, 9:01 PM
Cmjohnson updated the task description. (Show Details)Feb 25 2020, 12:42 PM

Change 574745 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for cloudvirt-wdqs100[1-3]

https://gerrit.wikimedia.org/r/574745

Change 574745 merged by Cmjohnson:
[operations/dns@master] Adding production dns for cloudvirt-wdqs100[1-3]

https://gerrit.wikimedia.org/r/574745

Change 574761 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding cloudvirt-wdqs servers to dhcpd file and netboot.cfg

https://gerrit.wikimedia.org/r/574761

Change 574761 merged by Cmjohnson:
[operations/puppet@production] Adding cloudvirt-wdqs servers to dhcpd file and netboot.cfg

https://gerrit.wikimedia.org/r/574761

Cmjohnson updated the task description. (Show Details)Feb 25 2020, 2:51 PM

@andrewbogott I may have chosen the wrong partman recipe, all 3 have started installing but failed. Please check and make any changes you need. Let me know if you need me to make the changes. Once the OS install completes and netbox is updated they are ready to be turned over

Are these following the same setup as the main production wdqs servers? These are using "partman/standard.cfg partman/raid10-4dev.cfg"

@MoritzMuehlenhoff these will be used as cloudvirts -- they need one small OS volume and one big raid10 volume. I'll look at the partman options later today.

Andrew added a comment.EditedFeb 27 2020, 5:04 PM

I made some bios changes on 1001:

  • configured raid (one big raid 10 w/all disks)
  • enabled virtualization
  • checked hyperthreading (apparently this is now called 'logical processor' and seems to be enabled by default, good.)

With the HW raid setup partman seems mostly happy. I'll make the same bios changes on the other two.

Andrew closed this task as Resolved.Feb 27 2020, 9:49 PM
Andrew updated the task description. (Show Details)

I have an OS installed on all three of these hosts and I'm experimenting on them in the cloud-vps cluster. Thanks for the set-up!