Page MenuHomePhabricator

rack/setup/install db11[26-38].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking/setup/installation of 13 new db hosts ordered for eqiad.

These hosts will replace db1061-db1073.

Racking Proposal: Please see comments below for racking discussion. We'll need to determine where to best place these considering what they are replacing - T211613#4812709

db1126:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1127:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1128:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1129:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1130:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1131:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1132:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1133:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1134:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1135:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1136:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1137:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1138:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Not until after the all hands. I will move it up on the list.

Cmjohnson updated the task description. (Show Details)Jan 30 2019, 10:58 PM

@Cmjohnson I can take care of the installations once you've done the RAID and added DNS and pxeboot entries with the MACs :-)

Change 490054 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Change 490054 merged by Marostegui:
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Marostegui updated the task description. (Show Details)Feb 12 2019, 3:31 PM
Marostegui raised the priority of this task from Normal to High.Apr 18 2019, 6:28 PM

I have increased the priority cause s4 master is having memory errors again and needs to be replaced as soon as we can

Cmjohnson updated the task description. (Show Details)Apr 25 2019, 3:53 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:04 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:15 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:34 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 2:50 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 3:22 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 4:01 PM
Cmjohnson reassigned this task from Cmjohnson to RobH.May 2 2019, 5:42 PM
Cmjohnson updated the task description. (Show Details)

@RobH all the servers are racked and on-site work has been completed. Some are off and some are in a state that just needs to rebooted.

Thanks Chris! Once @RobH has added the production DNS entries I can take over and install them myself :-)

Change 508354 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting production dns entries for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508354

Change 508354 merged by RobH:
[operations/dns@master] setting production dns entries for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508354

RobH reassigned this task from RobH to Marostegui.May 6 2019, 3:32 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

All set!

Change 508355 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] mac addresses for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508355

Change 508355 merged by RobH:
[operations/puppet@production] mac addresses for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508355

Thanks, now that ^ has been merged I will take over
Note: db1127 is still not present on the netboot.cfg because it is not accessible yet via idrac so the MAC cannot be retrieve. @Cmjohnson is on that. Once that is done it can be re-added
I will start the install of all the hosts except db1127

Thanks @Cmjohnson and @RobH

Marostegui updated the task description. (Show Details)May 6 2019, 3:53 PM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1126.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070511_marostegui_118294.log.

Completed auto-reimage of hosts:

['db1126.eqiad.wmnet']

and were ALL successful.

db1126 installed correctly:

root@db1126:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 5:29 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1128.eqiad.wmnet', 'db1129.eqiad.wmnet', 'db1130.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070531_marostegui_123474.log.

Completed auto-reimage of hosts:

['db1130.eqiad.wmnet']

Of which those FAILED:

['db1130.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1130.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070548_marostegui_129310.log.

db1128 and db1129 installed correctly:

root@db1128:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
root@db1129:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 5:50 AM

Change 508493 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Add DHCP lease for db1127

https://gerrit.wikimedia.org/r/508493

Change 508493 merged by Marostegui:
[operations/puppet@production] install_server: Add DHCP lease for db1127

https://gerrit.wikimedia.org/r/508493

Completed auto-reimage of hosts:

['db1130.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1127.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070605_marostegui_134280.log.

db1130 has been installed correctly:

root@db1130:~#  free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:06 AM

Completed auto-reimage of hosts:

['db1127.eqiad.wmnet']

and were ALL successful.

@RobH @Cmjohnson I have seen that the idrac for db1127 was working already so I have grabbed the MAC for the NIC and added the DHCP entry for it. So there is no need for you both to do it anymore as it installed correctly

root@db1127:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:23 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070623_marostegui_139335.log.

db1131 installed correctly:

root@db1131:~#  free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:47 AM

Completed auto-reimage of hosts:

['db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

Of which those FAILED:

['db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070650_marostegui_146684.log.

Completed auto-reimage of hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet']

and were ALL successful.

I am troubleshooting db1133's RAID, which is OFFLINE due to several disks being OFFLINE

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1134.eqiad.wmnet', 'db1135.eqiad.wmnet', 'db1136.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070718_marostegui_155235.log.

Completed auto-reimage of hosts:

['db1136.eqiad.wmnet', 'db1134.eqiad.wmnet', 'db1135.eqiad.wmnet']

and were ALL successful.

db1132 installed correctly:

root@db1132:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 56%, Taken 31 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1134 installed correctly:

root@db1134:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 8%, Taken 3 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1135 installed correctly:

root@db1135:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1136 installed correctly:

root@db1136:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 11%, Taken 5 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 7:39 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070739_marostegui_163174.log.

I am troubleshooting db1133's RAID, which is OFFLINE due to several disks being OFFLINE

The RAID is now back to ONLINE and Optimal. Going to run an installation again.

db1133 had another issue:
On reboot to go for an install and while connected on the idrac this is what I get:

Unified Server Configurator does not support console redirection.

I have updated the documentation for R440 hosts: https://wikitech.wikimedia.org/w/index.php?title=Platform-specific_documentation%2FDell_PowerEdge_RN10&type=revision&diff=1825341&oldid=1769901

And now db1133 on reboot:

FW could not sync up config/prop changes for some of the VD's/PD's
Press any key to continue, or 'C' to load the configuration utility.

Going for C doesn't do anything, so I think need need onsite help. @Cmjohnson can you destroy and recreate the RAID from scratch?
Thank you

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1137.eqiad.wmnet', 'db1138.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070836_marostegui_174999.log.

Completed auto-reimage of hosts:

['db1138.eqiad.wmnet', 'db1137.eqiad.wmnet']

and were ALL successful.

db1137 installed correctly:

root@db1137:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1138 installed correctly:

root@db1138:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 8:59 AM

The only pending host to install is db1133 which is having issues and we need on-site help from @Cmjohnson (T211613#5163570) - I have already pinged him on IRC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905071454_marostegui_16155.log.

@Cmjohnson re-created the RAID on site, but it is still showing up as degraded, so this host might need further troubleshooting. Not a big priority now as the rest of the hosts installed correctly and we have plenty of things to do, we can get back to this once Chris has time for it.

Completed auto-reimage of hosts:

['db1133.eqiad.wmnet']

Of which those FAILED:

['db1133.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905090917_marostegui_103846.log.

Completed auto-reimage of hosts:

['db1133.eqiad.wmnet']

Of which those FAILED:

['db1133.eqiad.wmnet']
RobH updated the task description. (Show Details)May 9 2019, 3:59 PM
RobH updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)May 9 2019, 4:06 PM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)May 22 2019, 5:24 AM
Marostegui changed the task status from Open to Stalled.Jun 4 2019, 5:10 PM
Marostegui changed the task status from Stalled to Open.Jun 25 2019, 7:38 AM

Finally db1133 has been installed correctly!
Thanks @Cmjohnson for getting it fixed!

root@db1133:~# megacli -LdPdInfo -a0 ; megacli -LdPdInfo -a0 | grep state ; megacli -LdPdInfo -a0 | grep -i Raw ;  megacli -LdPdInfo -a0 | grep state | wc -l ; free -g

Adapter #0

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No
Number of Spans: 1
Span: 0 - Number of PDs: 6

PD: 0 Information
Enclosure Device ID: 32
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 0
WWN: 55cd2e414ffabc22
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.746 TB [0xdf8fe2b0 Sectors]
Non Coerced Size: 1.745 TB [0xdf7fe2b0 Sectors]
Coerced Size: 1.745 TB [0xdf7c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  4096
Firmware state: Online, Spun Up
Device Firmware Level: DL58
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c0
Connected Port Number: 0(path0)
Inquiry Data:   BTYM8395029V1P9DGNSSDSC2KG019T7R                          SCV1DL58
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :24C (75.20 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




PD: 1 Information
Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: 1
Device Id: 1
WWN: 500080d910eb2cdc
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Non Coerced Size: 1.454 TB [0xba3d4ab0 Sectors]
Coerced Size: 1.454 TB [0xba3c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: DAC9
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c1
Connected Port Number: 0(path0)
Inquiry Data:         Z7KS113ETBWTTHNSF81D60CSE                               DAC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :25C (77.00 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




PD: 2 Information
Enclosure Device ID: 32
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: 1
Device Id: 2
WWN: 500080d910eb0b25
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Non Coerced Size: 1.454 TB [0xba3d4ab0 Sectors]
Coerced Size: 1.454 TB [0xba3c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: DAC9
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c2
Connected Port Number: 0(path0)
Inquiry Data:         Z7KS1147TBWTTHNSF81D60CSE                               DAC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :23C (73.40 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




PD: 3 Information
Enclosure Device ID: 32
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 0, Arm: 3
Enclosure position: 1
Device Id: 3
WWN: 500080d910eb0ba4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Non Coerced Size: 1.454 TB [0xba3d4ab0 Sectors]
Coerced Size: 1.454 TB [0xba3c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: DAC9
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c3
Connected Port Number: 0(path0)
Inquiry Data:         Z7KS112FTBWTTHNSF81D60CSE                               DAC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :24C (75.20 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




PD: 4 Information
Enclosure Device ID: 32
Slot Number: 4
Drive's position: DiskGroup: 0, Span: 0, Arm: 4
Enclosure position: 1
Device Id: 4
WWN: 500080d910eb0b7b
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Non Coerced Size: 1.454 TB [0xba3d4ab0 Sectors]
Coerced Size: 1.454 TB [0xba3c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: DAC9
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c4
Connected Port Number: 0(path0)
Inquiry Data:         Z7KS115KTBWTTHNSF81D60CSE                               DAC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :23C (73.40 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




PD: 5 Information
Enclosure Device ID: 32
Slot Number: 5
Drive's position: DiskGroup: 0, Span: 0, Arm: 5
Enclosure position: 1
Device Id: 5
WWN: 500080d910eb0b8e
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Non Coerced Size: 1.454 TB [0xba3d4ab0 Sectors]
Coerced Size: 1.454 TB [0xba3c0000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: DAC9
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3577446c5
Connected Port Number: 0(path0)
Inquiry Data:         Z7KS114ITBWTTHNSF81D60CSE                               DAC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :24C (75.20 F)
PI Eligibility:  No
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No




Exit Code: 0x00
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Raw Size: 1.746 TB [0xdf8fe2b0 Sectors]
Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
6
              total        used        free      shared  buff/cache   available
Mem:            502           1         500           0           0         498
Swap:             7           0           7
root@db1133:~#
Marostegui closed this task as Resolved.Jun 25 2019, 7:39 AM
Marostegui updated the task description. (Show Details)