Page MenuHomePhabricator

codfw rack/setup 22 DB servers
Closed, ResolvedPublic

Description

@Papaul let us know that the first 10 out of 22 new DB servers from T158669 just arrived.

We would like to rack the first 10 servers as this

HostnameRack
db2071a6
db2072b6
db2073c6
 db2074d6
db2075a1
db2076b1
db2077c1
db2078d1
db2079a5
db2080c5

The idea is to get those up and running and then we will decommission the servers <2030 (that's around 11 servers) and free up rack space, which we'll plan to use for the next servers that are coming, so we can fill the gaps.
@Papaul let us know if this makes sense to you

  • - racking schema approve ?
  • - receive in and attach packing slip to parent task T158669
  • - rack systems, update racktables
  • - create mgmt dns entries (both asset tag and hostname)
  • - create production dns entries (internal vlan)
  • - update/create sub task with network port info for all new hosts
  • - install_server module update (mac address and partitioning info,) Please provide partition schema
  • - install os
  • - puppet/salt accept
  • - hand off to @Marostegui for service implementation.

Update 6th April - Second batch of 12 servers:

@Papaul let us know the 5th of April that the second batch of servers have arrived, and we would like to rack them as follows:

HostnameRack
db2081a6
db2082b6
db2083c6
 db2084d6
db2085a5
db2086b1
db2087c1
db2088d1
db2089a3
db2090c5
db2091a8
db2092b8

@Papaul please let us know if this is doable

  • - racking schema approve ?
  • - receive in and attach packing slip to parent task T158669
  • - rack systems, update racktables
  • - create mgmt dns entries (both asset tag and hostname)
  • - create production dns entries (internal vlan)
  • - update/create sub task with network port info for all new hosts
  • - install_server module update (mac address and partitioning info,) Please provide partition schema
  • - install os
  • - puppet/salt accept
  • - hand off to @Marostegui for service implementation.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Papaul renamed this task from codfw racking first 10 DB servers to codfw rack/setup first 10 DB servers.Apr 4 2017, 4:00 PM
Papaul claimed this task.
Papaul triaged this task as Medium priority.
Papaul updated the task description. (Show Details)

install_server module update (mac address and partitioning info,) Please provide partition schema

Please create a RAID10 with the following options (https://wikitech.wikimedia.org/wiki/Raid_and_MegaCli):

RAID setups for database:

raid-10
256k stripe
writeback cache
no read ahead

For the partitioning info, this is an example of what needs to be used: https://gerrit.wikimedia.org/r/#/c/344337/
operations/puppet / modules/install_server/files/autoinstall/netboot.cfg

So we'd need to add something like:

db2071) echo partman/db.cfg ;; \
db2072) echo partman/db.cfg ;; \
db2073) echo partman/db.cfg ;; \
db2074) echo partman/db.cfg ;; \
db2075) echo partman/db.cfg ;; \
db2076) echo partman/db.cfg ;; \
db2077) echo partman/db.cfg ;; \
db2078) echo partman/db.cfg ;; \
db2079) echo partman/db.cfg ;; \
db2080) echo partman/db.cfg ;; \

We have to use db.cfg for the installation to format the partitions correctly and after that I will take care of reverting the change and leaving the "safe" option which fails on accidental restarts-reimages.

Change 346580 had a related patch set uploaded (by Jcrespo):
[operations/puppet@production] Indicate install recipes for newest db1* and db2* DB servers

https://gerrit.wikimedia.org/r/346580

^the above should be enough for the recipe. In addition to what Manuel stated, given problems we had in the past, we need to check:

  • IPMI calls work as intended
  • default boot is setup to disk (the installer should take care of that, but we should check)

@Marostegui we can not do a7 and b7 because we have 10G switch in a7 and b7 or the serves have only 1GB NIC's. Please relocate db2091 and db2092. Thanks.

@Marostegui we can not do a7 and b7 because we have 10G switch in a7 and b7 or the serves have only 1GB NIC's. Please relocate db2091 and db2092. Thanks.

Thanks for the heads up, edited the task with the new locations. Let us know if you see any other problem! Thanks!

@Marostegui db2085 needs to move as well.

@Marostegui db2085 needs to move as well.

Updated

@Marostegui Just for your information

asw-a2-codfw
asw-a7-codfw
asw-b2-codfw
asw-b7-codfw
asw-c2-codfw
asw-c7-codfw
asw-d2-codfw
asw-d7-codfw

are 10Gb switches so db2085 needs to be move again

@Marostegui Just for your information

asw-a2-codfw
asw-a7-codfw
asw-b2-codfw
asw-b7-codfw
asw-c2-codfw
asw-c7-codfw
asw-d2-codfw
asw-d7-codfw

are 10Gb switches so db2085 needs to be move again

Thanks for the list, that makes it a lot more clear :)
I have updated the task

Papaul renamed this task from codfw rack/setup first 10 DB servers to codfw rack/setup 22 DB servers.Apr 10 2017, 5:09 PM

Change 346580 merged by Jcrespo:
[operations/puppet@production] Indicate install recipes for newest db1* and db2* DB servers

https://gerrit.wikimedia.org/r/346580

Change 348037 had a related patch set uploaded (by Papaul):
[operations/dns@master] DNS:Add mgmt and production DNS for db20[7-9][0-9]

https://gerrit.wikimedia.org/r/348037

Change 348037 merged by Dzahn:
[operations/dns@master] DNS:Add mgmt and production DNS for db20[7-9][0-9]

https://gerrit.wikimedia.org/r/348037

Change 348758 had a related patch set uploaded (by Papaul):
[operations/puppet@production] DHCP: ADD MAC address entries for db20[7-9][0-9]

https://gerrit.wikimedia.org/r/348758

Change 348758 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address entries for db20[7-9][0-9].

https://gerrit.wikimedia.org/r/348758

@Papaul the following servers were ready to get puppet enabled and all that, so I did so, and rebooted them

db2071
db2072
db2073
db2075
db2076
db2079

db2071 is going to be used in production: : T163413 so if you'd need to do something to it, please ping @jcrespo or myself first.
The other ones are not yet used (or going to be used, we will update this task to let you know if we change our mind with any of them)

@Papaul note that db2074, db2077, db2078 and db2080 were not ready when I checked in the morning. So don't know if they got their OS installed or not (didn't check again)

@Marostegui none of the systems were ready. when the systems are ready i hang the task to the person responsible of the implementation.

Os installation, puppet/salt complete on

  • db2071
  • db2072
  • db2073
  • db2074
  • db2075
  • db2076
  • db2077
  • db2078
  • db2079
  • db2080
  • db2081
  • db2082
  • db2083
  • db2084
  • db2085
  • db2086
  • db2087
  • db2088
  • db2089
  • db2090
  • db2091
  • db2092

I confirm the following servers look good: db2081, db2085, db2086, db2089, db2091, db2092 (the previous ones were confirmed here: T162159#3196923)

db2084 can not boot to PXE. I am troubleshooting it.

I talked to @ayounsi, db2084 was in the Wrong VLAN so the installation is complete now running puppet and salt on the server

Papaul updated the task description. (Show Details)

This complete. @Marostegui you can take over from here. Thanks.

Thanks @Papaul all the hosts are looking good!
I will mark this ticket as resolved then.