Page MenuHomePhabricator

rack/setup/install db11[26-38].eqiad.wmnet
Open, HighPublic

Description

This task will track the racking/setup/installation of 13 new db hosts ordered for eqiad.

These hosts will replace db1061-db1073.

Racking Proposal: Please see comments below for racking discussion. We'll need to determine where to best place these considering what they are replacing - T211613#4812709

db1126:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1127:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1128:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1129:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1130:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1131:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1132:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1133:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1134:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1135:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1136:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1137:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

db1138:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH created this task.Dec 10 2018, 6:06 PM
RobH triaged this task as Normal priority.
RobH added a comment.EditedDec 10 2018, 6:10 PM

So, to figure out the racking plan:

db1061: s6 master : C3
db1062: s7 master : D4
db1063: m1 master : C5
db1064: x1 slave : D1
db1065: m5 master : D1
db1066: s2 master : A6
db1067: s1 master : C6
db1068: s4 master : D1
db1069: x1 master : A1
db1070: s5 master : D1
db1071: s8 master : D1
db1072: m3 master : B2
db1073: m5 master : B3

So, if the new systems can rack in the exact same racks, it is easy. However, I think mostly we just don't want them to be in the same racks as their db replication slaves, as these will be new masters db systems?

So I suppose we need to list out each shard and where its replication slaves are, and ensure they dont share? That assumes these new hosts are masters, which I likely shouldnt assume. Also we'll need to assign the shards for use now so we don't rack these in the wrong places, so we'll need DBA input!

Hey @RobH!
Thanks for putting up an initial racking plan.

It is a bit more complicated than just replacing the masters, as we also have candidate masters (hosts that can become masters if the current masters die) and we need to also have those distributed.
Given the fact that we have 13 masters (s1-s8,x1, m1-m5 (not m4)), ideally, we should have 3 masters + 3 candidate masters per row (on different racks) (one row with 4 masters) to have them equally distributed.
It is hard to really know how we'll do it, if we will, and when. So I think it is easier to rack those hosts with that plan, and then we (DBAs) can do all the movements logically, as we need to invest sometime in checking and putting up a plan for that, which will take quite sometime as there are lots of things to keep in mind.

To sum up, let's rack those servers with that in mind:

  • 4 servers in row A
  • 3 servers in row B
  • 3 servers in row C
  • 3 servers in row D

Let's put them in different racks if that is possible within the row.

Change 478829 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Install db11[26-38] new DB hosts

https://gerrit.wikimedia.org/r/478829

Change 478829 merged by Marostegui:
[operations/puppet@production] mariadb: Install db11[26-38] new DB hosts

https://gerrit.wikimedia.org/r/478829

RobH assigned this task to Cmjohnson.Dec 11 2018, 4:33 PM
Cmjohnson moved this task from Backlog to Racking Tasks on the ops-eqiad board.Dec 11 2018, 6:36 PM
Marostegui updated the task description. (Show Details)Jan 13 2019, 8:30 PM

@Cmjohnson you've got any rough ETA for these?
Thanks!

Not until after the all hands. I will move it up on the list.

Cmjohnson updated the task description. (Show Details)Jan 30 2019, 10:58 PM

@Cmjohnson I can take care of the installations once you've done the RAID and added DNS and pxeboot entries with the MACs :-)

Change 490054 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Change 490054 merged by Marostegui:
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Marostegui updated the task description. (Show Details)Feb 12 2019, 3:31 PM
Marostegui raised the priority of this task from Normal to High.Thu, Apr 18, 6:28 PM

I have increased the priority cause s4 master is having memory errors again and needs to be replaced as soon as we can