Page MenuHomePhabricator

rack/setup/install cloudcephmon100[123]
Open, HighPublic0 Story Points

Description

This task will track the racking, setup, installation, and deployment of three new servers for ceph monitoring nodes with 10G connections.

Please note a number of pending questions for this task are also pending for related ceph nodes task T224188. Until some decisions are made on T224188, it will stall the racking/deployment of these hosts as well.

Hostname Proposal: cloudcephmon100* proposed by @Bstorm in irc when discussing these systems (later altered a bit by WMCS weekly meeting--now it is cloudcephmon100*).

Racking Proposal: These are NOT replacing any existing systems, but will need to communicate with the systems being racked on T224188. These will most likely replicate the racking layout of T224188 to produce the most redundancy within this service cluster.

cloudcephmon1001:

  • - receive in system on procurement task T222916
  • - add system into netbox while racking plan is being determined. This way it will show on the proper accounting reports.
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cloudcephmon1002:

  • - receive in system on procurement task T222916
  • - add system into netbox while racking plan is being determined. This way it will show on the proper accounting reports.
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cloudcephmon1003:

  • - receive in system on procurement task T222916
  • - add system into netbox while racking plan is being determined. This way it will show on the proper accounting reports.
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

RobH triaged this task as Normal priority.Jul 15 2019, 8:13 PM
RobH created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJul 15 2019, 8:14 PM

We already have some servers in a similar namespace: labmon1001 and labmon1002. I find it confusing that we use a similar naming scheme for 2 different types of servers.

I would recommend using something different for the new servers and leave cloudmon for when we refresh labmon servers.
Some ideas (no strong opinions about any):

  • cloudcephmon
  • cloudstoremon
  • cloudstorecephmon

cc @Bstorm

Bstorm added a comment.EditedJul 16 2019, 11:33 AM

Great point @aborrero! I almost half wanted to name all of these "cloudstore" and figure it out from there, but that's not great. cloudstoremon perhaps just to keep the brand out of the name. The OSDs are literally slated to be cloudosd. We'll see after a bit of discussion :)

Bstorm renamed this task from rack/setup/install cloudmon100[123] to rack/setup/install cloudcephmon100[123].Jul 16 2019, 7:45 PM
Bstorm updated the task description. (Show Details)

After talking in the weekly meeting, it's now cloudcephmon100*, updating the description.

Cmjohnson moved this task from Backlog to Racking Tasks on the ops-eqiad board.Jul 16 2019, 7:56 PM
Bstorm reassigned this task from Bstorm to RobH.Jul 25 2019, 6:38 PM

The racking proposal is detailed in T224188, so re-assigning

Cmjohnson reassigned this task from RobH to Jclark-ctr.Aug 14 2019, 3:06 PM
Cmjohnson added subscribers: Jclark-ctr, Cmjohnson.

@Jclark-ctr can you add asset tags and enter these servers into Netbox (T222916 is the procurement task). Leave them on the floor and the rack information blank in netbox until we know for sure where they're going. Once done, please re-assign back to Rob

Jclark-ctr reassigned this task from Jclark-ctr to RobH.Aug 20 2019, 6:32 PM
Jclark-ctr updated the task description. (Show Details)

asset tagged and added to Netbox

Jclark-ctr reassigned this task from Jclark-ctr to RobH.Aug 20 2019, 7:48 PM
Cmjohnson reassigned this task from RobH to Jclark-ctr.Aug 29 2019, 4:41 PM

@Jclark-ctr please rack 1 each in B2/B4/B7 please and update netbox

host                         	row	unit
cloudcephmon1001	b7	26
cloudcephmon1002	b4	11
cloudcephmon1003	b2	11

Change 534255 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for cloudcephmon100[1-3]

https://gerrit.wikimedia.org/r/534255

@Jclark-ctr Please set up the idrac and add the mgmt dns. Let me know if you have any issues or questions. I also need the switch ports.

+cloudcephmon1001 1H IN A 10.65.3.125
+cloudcephmon1002 1H IN A 10.65.3.126
+cloudcephmon1003 1H IN A 10.65.3.127

Change 534255 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for cloudcephmon100[1-3]

https://gerrit.wikimedia.org/r/534255

Cmjohnson updated the task description. (Show Details)Sep 6 2019, 5:48 PM
aborrero raised the priority of this task from Normal to High.Fri, Oct 4, 8:48 AM

Raising priority of this ticket, since the ceph project is part of our Q2 goals.

@aborreo I need to know vlan requirements? Same as cephosd? 1 public 1 private?

RobH removed a subscriber: RobH.Fri, Oct 4, 9:55 PM
Cmjohnson updated the task description. (Show Details)Tue, Oct 8, 11:48 AM