Page MenuHomePhabricator

(Need By: 31st May) rack/setup/install db213[6-9] and db2140
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of db213[6-9]

Hostname / Racking / Installation Details

Hostnames: db2136.codfw.wmnet db2137.codfw.wmnet db2138.codfw.wmnet db2139.codfw.mnet db2140.codfw.wmnet
Racking Proposal: 1 per row and not in: A8, B6, C5, D3
Networking/Subnet/VLAN/IP: Private VLAN like the rest of the databases, in 1G rack (we don't need 10G here, but we want to order them with 10G just in case, for the future)
Partitioning/Raid:: RAID10 + 256k stripe size. Normal db raid recipe @Marostegui will take care of this puppet part.
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

db2136: Rack A1U27 ge-1/0/0

  • - receive in system on procurement task T246007
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

db2137: Rack B1U4 ge-1/0/20

  • - receive in system on procurement task T246007
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

db2138: Rack C1U28 ge-1/0/13

  • - receive in system on procurement task T246007
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

db2139: Rack D1U18 ge-1/0/16

  • - receive in system on procurement task T246007
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

db2140: Any rack works, as long as it is different from the above ones and not in A8, B6, C5, D3 ( Racking in D6U1) ge-6/0/0

  • - receive in system on procurement task T246007
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH created this task.May 1 2020, 7:22 PM
Restricted Application added a project: Operations. · View Herald TranscriptMay 1 2020, 7:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).May 1 2020, 7:22 PM
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH removed a subscriber: RobH.
RobH updated the task description. (Show Details)May 1 2020, 8:20 PM
Papaul updated the task description. (Show Details)May 1 2020, 8:20 PM
RobH reassigned this task from Papaul to jcrespo.May 1 2020, 8:21 PM
RobH added subscribers: jcrespo, Papaul.

@jcrespo or @Marostegui:

The racking details from the ordering task only list 4 hosts, but we ended up ordering 5. So the racking info lists 1 per row, but we're getting 5 hosts, so a row will need to have 2. Do you have a preference on this?

Please advise so the task can be updated, and assign back to @Papaul for followup, thank you!

Marostegui renamed this task from (Need By: TBD) rack/setup/install db213[6-9] to (Need By: 31st May) rack/setup/install db213[6-9] and db2140.May 4 2020, 5:02 AM
Marostegui reassigned this task from jcrespo to Papaul.
Marostegui added a project: DBA.
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: RobH.

@jcrespo or @Marostegui:

The racking details from the ordering task only list 4 hosts, but we ended up ordering 5. So the racking info lists 1 per row, but we're getting 5 hosts, so a row will need to have 2. Do you have a preference on this?

Please advise so the task can be updated, and assign back to @Papaul for followup, thank you!

Done - any rack that is not A8, B6, C5, D3 and that is not the same as the other 4, works.
I have updated the task.

Change 593974 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add db213[6-9] and db2140 as spares

https://gerrit.wikimedia.org/r/593974

Change 593974 merged by Marostegui:
[operations/puppet@production] mariadb: Add db213[6-9] and db2140 as spares

https://gerrit.wikimedia.org/r/593974

Marostegui updated the task description. (Show Details)May 4 2020, 7:33 AM

@Papaul the initial puppet changes are done. From puppet side the only pending thing is; to add them to the DCHP file (if you send me the mac addresses and I can do that for you).

RobH removed a subscriber: RobH.May 4 2020, 2:37 PM
Papaul updated the task description. (Show Details)May 7 2020, 2:03 AM
Papaul updated the task description. (Show Details)May 7 2020, 3:57 PM
Papaul updated the task description. (Show Details)May 7 2020, 6:47 PM

@Marostegui thanks for updating the operations/puppet update portion.

Papaul updated the task description. (Show Details)May 8 2020, 5:31 PM

Change 596071 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add DNS for db213[6-9] and db2140

https://gerrit.wikimedia.org/r/596071

Change 596071 merged by Papaul:
[operations/dns@master] DNS: Add DNS for db213[6-9] and db2140

https://gerrit.wikimedia.org/r/596071

Papaul updated the task description. (Show Details)May 13 2020, 12:05 AM

Change 596226 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add db213[6-9] and db2140 MAC address

https://gerrit.wikimedia.org/r/596226

Change 596226 merged by Papaul:
[operations/puppet@production] DHCP: Add db213[6-9] and db2140 MAC address

https://gerrit.wikimedia.org/r/596226

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2136.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005131449_pt1979_15719_db2136_codfw_wmnet.log.

Papaul added a comment.EditedMay 13 2020, 2:53 PM
[edit interfaces interface-range vlan-private1-a-codfw]
     member xe-2/0/3 { ... }
+    member ge-1/0/0;
[edit interfaces interface-range disabled]
-    member ge-1/0/0;
[edit interfaces]
+   ge-1/0/0 {
+       description db2136;
+   }
[edit interfaces interface-range vlan-private1-b-codfw]
     member ge-8/0/11 { ... }
+    member ge-1/0/20;
[edit interfaces interface-range disabled]
-    member ge-1/0/20;
[edit interfaces]
+   ge-1/0/20 {
+       description db2137;
+   }
[edit interfaces interface-range vlan-private1-c-codfw]
     member xe-7/0/14 { ... }
+    member ge-1/0/13;
[edit interfaces interface-range disabled]
-    member ge-1/0/13;
[edit interfaces]
+   ge-1/0/13 {
+       description db2138;
+   }
[edit interfaces interface-range vlan-private1-d-codfw]
     member xe-2/0/0 { ... }
+    member ge-1/0/16;
[edit interfaces interface-range disabled]
-    member ge-1/0/16;
[edit interfaces]
+   ge-1/0/16 {
+       description db2139;
+   }
[edit interfaces interface-range vlan-private1-d-codfw]
     member ge-1/0/16 { ... }
+    member ge-6/0/0;
[edit interfaces interface-range disabled]
-    member ge-6/0/0;
[edit interfaces]
+   ge-6/0/0 {
+       description db2140;
+   }
Papaul updated the task description. (Show Details)May 13 2020, 2:53 PM
Papaul updated the task description. (Show Details)May 13 2020, 3:01 PM
Papaul updated the task description. (Show Details)May 13 2020, 3:07 PM

Completed auto-reimage of hosts:

['db2136.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2137.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005131516_pt1979_20172_db2137_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2138.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005131521_pt1979_20507_db2138_codfw_wmnet.log.

@Marostegui please check is this looks good on db2136

Disk /dev/sda: 8.7 TiB, 9598580817920 bytes, 18747228160 sectors
Disk model: PERC H730P Adp
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: F9440BFB-32ED-4FB0-9CC3-D1AB00BD250A

Device        Start         End     Sectors  Size Type
/dev/sda1      2048    78125055    78123008 37.3G Linux filesystem
/dev/sda2  78125056    93749247    15624192  7.5G Linux swap
/dev/sda3  93749248 18747226111 18653476864  8.7T Linux LVM


Disk /dev/mapper/tank-data: 7.6 TiB, 8309008498688 bytes, 16228532224 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Papaul updated the task description. (Show Details)May 13 2020, 3:27 PM
Papaul updated the task description. (Show Details)May 13 2020, 3:33 PM

Completed auto-reimage of hosts:

['db2137.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2139.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005131540_pt1979_22943_db2139_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2138.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2140.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005131547_pt1979_24732_db2140_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2140.codfw.wmnet']

Of which those FAILED:

['db2140.codfw.wmnet']

Completed auto-reimage of hosts:

['db2139.codfw.wmnet']

and were ALL successful.

Papaul closed this task as Resolved.May 13 2020, 5:44 PM
Papaul updated the task description. (Show Details)

@Marostegui complete

jcrespo reopened this task as Open.May 15 2020, 10:43 AM

@Papaul see the FAILED above for db2140, as well as the

db2140 	missing physical device in PuppetDB: state Staged in Netbox

on netbox.

I cannot ssh to it. Maybe it just needs another try, or there was something else that went wrong?

@jcrespo the server first boot was set to NIC1 and not Hard drive 1 so when it completed the OS install the first time re rebooted to PXE again it was in a loop so i change it and about to re-image asgain

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2140.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005151344_pt1979_21547_db2140_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2140.codfw.wmnet']

Of which those FAILED:

['db2140.codfw.wmnet']

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2140.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005151344_pt1979_21591_db2140_codfw_wmnet.log.

Completed auto-reimage of hosts:

['db2140.codfw.wmnet']

and were ALL successful.

jcrespo closed this task as Resolved.May 15 2020, 2:10 PM

Thanks, @Papaul

@jcrespo all good sorry about the problem

Change 596650 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install: Disable reimage of db114[1-9], db213[6-9] and db2140

https://gerrit.wikimedia.org/r/596650

This looks good on all the hosts:

----- OUTPUT of 'df -hT /srv' -----
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   7.6T  8.2G  7.6T   1% /srv

----- OUTPUT of 'free -g' -----
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7

----- OUTPUT of 'megacli -CfgDspl...AID Level|Strip'' -----
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Strip Size          : 256 KB

Thanks

Change 596650 merged by Marostegui:
[operations/puppet@production] install: Disable reimage of db114[1-9], db213[6-9] and db2140

https://gerrit.wikimedia.org/r/596650