Page MenuHomePhabricator

(Need By: TBD) rack/setup/install cloudcephmon200[12]-dev
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostnames: cloudcephmon2001-dev cloudcephmon2002-dev
Network: Each host needs one nic connected to cloud-hosts1-b-codfw . Physical location doesn't matter as long as that vlan connection is possible. 1G networking is fine.
We only need OS partitions for these. what OS partitions?
Debian Buster.

Per host setup checklist

cloudcephmon2001: rack B1 ge-1/0/24

  • - update netbox to allocate spare wmf6383 to cloudcephmon2001
  • - apply hostname labels to front/back of host
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

cloudcephmon2002: rack B5 ge-5/0/11

  • - update netbox to allocate spare WMF6576 to cloudcephmon2002
  • - apply hostname labels to front/back of host
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

@Andrew: "We only need OS partitions for these." Does this mean just a normal raid10 lvm setup of the 4 disks or what? I'm assuming yes.

RobH added a parent task: Unknown Object (Task).Nov 5 2020, 11:03 PM
RobH updated the task description. (Show Details)
RobH mentioned this in Unknown Object (Task).
RobH added a subscriber: Papaul.

@Andrew: second question:

" Network: Each host needs one nic connected to cloud-hosts1-b-codfw - vlan 2105. Physical location doesn't matter as long as that vlan connection is possible. 1G networking is fine."

I do not see a vlan ID 2105 for row D in codfw, so I'm not sure what the comment was about this on the procurement task:

In T266264#6590500, @Papaul wrote:

"Each host needs one nic connected to cloud-hosts1-b-eqiad - vlan 2105. Physical location doesn't matter as long as that vlan connection is possible. 1G networking is fine."

This is for codfw and we have coud-hosts1-b-eqiad as vlan

When I look at the switch, I see cloud-hosts1-b-codfw 2118, but no 2105. Also these host is in row C & D, so I suspect we may have to relocate them. I'm not sure if @Papaul meant they can connect where they are, or that they have to move.

@Papaul: Please advise?

VALN ID 2105 and the VLAN it self is not created yet. @ayounsi is taking care of that.

@Andrew: "We only need OS partitions for these." Does this mean just a normal raid10 lvm setup of the 4 disks or what? I'm assuming yes.

Yep, just one raid10 w/lvm is what we need.

Papaul updated the task description. (Show Details)

@ayounsi I am working on setting those to servers . 1 is in row C and the other one is in row D. We have the

cloud-hosts1-b-codfw {
    vlan-id 2118;
}

in both rows but we do not have any interface-range vlan-cloud-hosts1-b-codfw in those row. I wanted to create the interface range but I do not know how to named them

for row C you want me to call the interface range

interface-range vlan-cloud-hosts1-c-codfw

for row D

interface-range vlan-cloud-hosts1-d-codfw

or just leave it the same as the vlan name

interface-range vlan-cloud-hosts1-b-codfw

Mentioned in SAL (#wikimedia-operations) [2020-11-10T16:32:21Z] <XioNoX> add cloud-storage1-b-codfw to, well, codfw switches - T267378

WMCS (and thus cloud-hosts1-b-codfw) is only in row B. So the servers will have to move to row B.

cloud-storage1-b-codfw for the 2nd NIC has been created as well. Feel free to create interface-range vlan-cloud-storage1-b-codfw or let me know which port it's going to go into and I can configure the first one.

I'm going to reuse an old puppetmaster as cloudcephmon2003-dev (T258103) -- does that server also need to be re-racked or can we just rename it in place?

Andrew renamed this task from (Need By: TBD) rack/setup/install cloudcephmon200[12] to (Need By: TBD) rack/setup/install cloudcephmon200[12]-dev.Nov 16 2020, 6:39 PM
Andrew updated the task description. (Show Details)

please note hostname change -- these should be cloudcephmon2001-dev and cloudcephmon2002-dev

Please in the future those changes need to be done before i have already applied the label on all the hosts now i have to go back and make those changes again

Please in the future those changes need to be done before i have already applied the label on all the hosts now i have to go back and make those changes again

Yeah, I just realized that I forgot the prefix in my earlier ticket. Sorry :(

We only need OS partitions for these. what OS partitions?

We only need OS partitions for these. what OS partitions?

Do you mean which OS or how to partition? The OS should be Buster. In terms of how to partition, is this what you need? https://phabricator.wikimedia.org/T267378#6607940

@Andrew OS partitions needs a partman recipe. what partman recipe do you want to use for those servers ?
Each server has 4x4TB disks

If they have hw raid then all drives in one big raid10 and partman recipe hwraid-1dev.cfg. If no hwraid then... I think raid10-4dev.cfg ? It's hard for me to say, I'm not familiar with the hardware and also all the partman recipes have been rewritten since I last looked. Whatever is simple is good with me, it's not very critical for these hosts.

Change 641852 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production DNS for cloudcephmon200[1-2]-dev

https://gerrit.wikimedia.org/r/641852

Change 641852 merged by Papaul:
[operations/dns@master] DNS: Add production DNS for cloudcephmon200[1-2]-dev

https://gerrit.wikimedia.org/r/641852

Change 641864 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP add MAC address and netboot.cfg entries for cloudephmon200[1-2]

https://gerrit.wikimedia.org/r/641864

Change 641864 merged by Papaul:
[operations/puppet@production] DHCP add MAC address and netboot.cfg entries for cloudephmon200[1-2]

https://gerrit.wikimedia.org/r/641864

Change 641994 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add cloudcephmon200[1-2]-dev to site.pp

https://gerrit.wikimedia.org/r/641994

Change 641994 merged by Papaul:
[operations/puppet@production] Add cloudcephmon200[1-2]-dev to site.pp

https://gerrit.wikimedia.org/r/641994

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

cloudcephmon2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011191442_pt1979_20992_cloudcephmon2001-dev_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

cloudcephmon2002-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011191450_pt1979_21904_cloudcephmon2002-dev_codfw_wmnet.log.

Completed auto-reimage of hosts:

['cloudcephmon2001-dev.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['cloudcephmon2002-dev.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)

Complete