Page MenuHomePhabricator

rack/setup/install db21[21-30].codfw.wmnet
Closed, ResolvedPublic0 Story Points

Description

This task will track the racking, setup, and installation of 10 new core database servers for codfw.

Shared Info

Hostnames: From db2121.codfw.wmnet to db2130.codfw.wmnet
Racking Proposal: Please place the following hosts on the following rows (if possible do not share a rack) - we don't care about which hostname goes to which row, just the amount of servers:
Row A: 2 hosts
Row B: 2 hosts
Row C: 3 hosts
Row D: 3 hosts

Keep in mind that there are a bunch of hosts ready for DCOPs to decommission in case you need to make space for these new ones (T221533)

Networking/Subnet/IP: Normal database network connection and addressing, 1G
Partitioning/Raid: Hardware RAID10 stripsize of 256kb

Individual Server Checklists

db2121: asw-a5:ge-5/0/0

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2122: asw-a6:ge-6/0/9

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2123: asw-b3:ge-3/0/25

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2124:asw-b6:ge-6/0/4

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2125: asw-c1:ge-1/0/11

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2126: asw-c5:ge-5/0/31

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2127:asw-c6:ge-6/0/14

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

db2128:asw-d1:ge-1/0/3

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

**db2129:asw-d5:ge-5/0/2

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

**db2130:asw-d6:ge-6/0/18

  • - receive in system on procurement task T224973
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, private1-row-vlan)
    • end on-site specific steps
  • - production dns entries added (private subnet for its row)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

RobH triaged this task as Normal priority.Jul 2 2019, 4:32 PM
RobH created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJul 2 2019, 4:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Jul 2 2019, 4:32 PM
Marostegui updated the task description. (Show Details)Jul 3 2019, 6:21 AM

Change 520379 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow reimage db21[21-30].codfw.wmnet

https://gerrit.wikimedia.org/r/520379

Change 520379 merged by Marostegui:
[operations/puppet@production] install_server: Allow reimage db21[21-30].codfw.wmnet

https://gerrit.wikimedia.org/r/520379

Marostegui added a comment.EditedJul 3 2019, 8:34 AM

@RobH @Papaul I have merged: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/520379/
The only changes pending from your side to be able to install these hosts once they arrive would be:

  • Production DNS entries
  • MGMT DNS entries
  • MAC entries for the DHCP
Papaul moved this task from Backlog to Racking Tasks on the ops-codfw board.Jul 12 2019, 12:50 AM
Papaul updated the task description. (Show Details)Jul 22 2019, 5:08 PM
Papaul updated the task description. (Show Details)Jul 22 2019, 6:24 PM
Papaul updated the task description. (Show Details)Jul 23 2019, 4:29 PM
Papaul updated the task description. (Show Details)Jul 23 2019, 4:52 PM
Papaul updated the task description. (Show Details)Jul 23 2019, 5:16 PM
Papaul updated the task description. (Show Details)Jul 23 2019, 11:49 PM

Change 525277 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for db21[21-30]

https://gerrit.wikimedia.org/r/525277

Change 525277 merged by Marostegui:
[operations/dns@master] DNS: Add mgmt and production DNS for db21[21-30]

https://gerrit.wikimedia.org/r/525277

Change 525318 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address entries for db21[21-30]

https://gerrit.wikimedia.org/r/525318

Papaul updated the task description. (Show Details)Jul 24 2019, 4:39 PM
Papaul updated the task description. (Show Details)
Papaul updated the task description. (Show Details)Jul 24 2019, 4:47 PM
Papaul updated the task description. (Show Details)Jul 24 2019, 4:56 PM
Papaul updated the task description. (Show Details)Jul 24 2019, 5:07 PM
Papaul updated the task description. (Show Details)Jul 24 2019, 5:20 PM

Change 525318 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address entries for db21[21-30]

https://gerrit.wikimedia.org/r/525318

Papaul updated the task description. (Show Details)Jul 24 2019, 9:30 PM
Papaul updated the task description. (Show Details)Jul 24 2019, 9:34 PM
Papaul updated the task description. (Show Details)Jul 25 2019, 12:30 AM
Marostegui updated the task description. (Show Details)Jul 25 2019, 5:03 AM

db2121-db2125 looking good! Thanks

Marostegui updated the task description. (Show Details)Jul 25 2019, 5:06 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2126.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201907250714_marostegui_24580.log.

Completed auto-reimage of hosts:

['db2126.codfw.wmnet']

Of which those FAILED:

['db2126.codfw.wmnet']

@Papaul I tried to install db2126 myself to advance on the task, but looks like it keeps rebooting on PXE forever :-)
I think it needs your on-site magic, as with the latency I am not able to get into the BIOS to change that :)

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2126.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201907250748_marostegui_30313.log.

Completed auto-reimage of hosts:

['db2126.codfw.wmnet']

and were ALL successful.

@Papaul I tried to install db2126 myself to advance on the task, but looks like it keeps rebooting on PXE forever :-)
I think it needs your on-site magic, as with the latency I am not able to get into the BIOS to change that :)

I was finally able to get into the BIOS and change the boot order settings. So I got to install the host :-)

Marostegui updated the task description. (Show Details)Jul 25 2019, 8:14 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2128.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201907250838_marostegui_41460.log.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2129.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201907250847_marostegui_42484.log.

Completed auto-reimage of hosts:

['db2128.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2130.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201907250903_marostegui_46177.log.

Marostegui updated the task description. (Show Details)Jul 25 2019, 9:04 AM

Completed auto-reimage of hosts:

['db2129.codfw.wmnet']

and were ALL successful.

Marostegui updated the task description. (Show Details)Jul 25 2019, 9:13 AM

Completed auto-reimage of hosts:

['db2130.codfw.wmnet']

and were ALL successful.

Marostegui updated the task description. (Show Details)Jul 25 2019, 9:27 AM

@Papaul all hosts but db2127 have been installed.
I have managed to get into the BIOS of all of them and change the boot settings, however, db2127's idrac password doesn't seem to be working, so I have been unable to login there and change its setting.
Can you reset its password?

Thanks!

Change 526162 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Change db2127's MAC address

https://gerrit.wikimedia.org/r/526162

Change 526162 merged by Marostegui:
[operations/puppet@production] install_server: Change db2127's MAC address

https://gerrit.wikimedia.org/r/526162

Papaul reassigned this task from Papaul to Marostegui.Jul 29 2019, 3:35 PM
Papaul updated the task description. (Show Details)
Papaul added a subscriber: Papaul.

@Marostegui all yours

db2127 looking good!

root@db2127:~# free -g ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         498
Swap:             7           0           7
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   5.2T  6.2G  5.2T   1% /srv

Also the idrac password works

Thanks a lot @Papaul!

Marostegui closed this task as Resolved.Jul 29 2019, 3:55 PM
Marostegui updated the task description. (Show Details)