Page MenuHomePhabricator

(Need By: TBD) rack/setup/install db21[45-52]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostnames: db21[45-52]
Racking Proposal: 2 hosts per row, preferably different rack for each host.
Networking/Subnet/VLAN/IP: 1G, internal vlan, 1 production and 1 mgmt connection
Partitioning/Raid: hwraid10(256k stripe, read ahead, write back) and db recipe
OS Distro: buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

db2145: A5/U16 ge-5/0/13

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2146: A8/U29 ge-8/0/1

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2147:B1/U18 ge-1/0/25

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2148:B8/U15 ge-8/0/12

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2149:C1/U1 ge-1/0/0

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2150:C3/U19 ge-3/0/18

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2151:D1/U5 ge-1/0/4

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

db2152:D8/U2 ge-8/0/1

  • - receive in system on procurement task T271227 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
Marostegui moved this task from Triage to Blocked on the DBA board.

Change 662139 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add new codfw databases to insetup

https://gerrit.wikimedia.org/r/662139

Change 662139 merged by Marostegui:
[operations/puppet@production] mariadb: Add new codfw databases to insetup

https://gerrit.wikimedia.org/r/662139

These hosts have been added to puppet with: insetup role and also assigned a partman recipe for the installation.
The only puppet change needed from DCOPS is the one related to the DHCP config.

Papaul updated the task description. (Show Details)

These hosts have been added to puppet with: insetup role and also assigned a partman recipe for the installation.
The only puppet change needed from DCOPS is the one related to the DHCP config.

@Marostegui Thanks for that

Change 666496 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP Add MAC address for db21[45-52]

https://gerrit.wikimedia.org/r/666496

Change 666496 merged by Papaul:
[operations/puppet@production] DHCP Add MAC address for db21[45-52]

https://gerrit.wikimedia.org/r/666496

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2145.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102232346_pt1979_13076_db2145_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2145.codfw.wmnet']

Of which those FAILED:

['db2145.codfw.wmnet']

Why db2145 failed below

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, No puppet role has been assigned to this node. (file: /etc/puppet/manifests/site.pp, line: 2530, column: 9) on node db2145.codfw.wmnet

or @Marostegui wrote

"
    In T273568#6809673, @Marostegui wrote:

    These hosts have been added to puppet with: insetup role and also assigned a partman recipe for the installation.
    The only puppet change needed from DCOPS is the one related to the DHCP config.
"
checking that

Change 666510 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Fix role insetup for db21[45-52]

https://gerrit.wikimedia.org/r/666510

Change 666510 merged by Papaul:
[operations/puppet@production] Fix role insetup for db21[45-52]

https://gerrit.wikimedia.org/r/666510

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2145.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102240152_pt1979_2205_db2145_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2145.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2146.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102240218_pt1979_8711_db2146_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

['db2147.codfw.wmnet', 'db2148.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202102240221_pt1979_9233.log.

Completed auto-reimage of hosts:

['db2146.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2149.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102240258_pt1979_15820_db2149_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2147.codfw.wmnet', 'db2148.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['db2149.codfw.wmnet']

and were ALL successful.

Why db2145 failed below

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, No puppet role has been assigned to this node. (file: /etc/puppet/manifests/site.pp, line: 2530, column: 9) on node db2145.codfw.wmnet

or @Marostegui wrote

"
    In T273568#6809673, @Marostegui wrote:

    These hosts have been added to puppet with: insetup role and also assigned a partman recipe for the installation.
    The only puppet change needed from DCOPS is the one related to the DHCP config.
"
checking that

Thanks for catching and fixing this!

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2150.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102241349_pt1979_12642_db2150_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2151.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102241358_pt1979_13545_db2151_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2150.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2152.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102241414_pt1979_18725_db2152_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2151.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['db2152.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)

@Marostegui all yours. Have fun

Thank you Papaul - we'll take it from here
They look good:

[14:48:47] marostegui@cumin1001:~$ sudo cumin 'db21[45-52].codfw.wmnet' 'free -g ; echo ; df -hT /srv; echo ; pvs ; echo ; megacli -LdPdInfo -a0 | egrep "RAID|Strip Size"'
8 hosts will be targeted:
db[2145-2152].codfw.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(8) db[2145-2152].codfw.wmnet
----- OUTPUT of 'free -g ; echo ;...RAID|Strip Size"' -----
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7

Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   7.6T  8.2G  7.6T   1% /srv

  PV         VG   Fmt  Attr PSize  PFree
  /dev/sda3  tank lvm2 a--  <8.69t <1.13t

RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Strip Size          : 256 KB

@Papaul reminder for next iterations, please do not add ipv6 entries for DB hosts (T270101)
I have already removed them from netbox
Thanks!

@Marostegui understood. We will have to mentioned that on all the next racking task now as a side note so i do not forget.

Thanks.

@Papaul Can it be added to the template or does it need to be added manually to every task?

@LSobanski yes it can be added to the template.

@Papaul Thanks. I'm guessing https://phabricator.wikimedia.org/maniphest/task/edit/form/66/ would be the best place for this. Unfortunately I cannot edit the form. Would you have access to do this or should I reach out to Rob or Willy?