Page MenuHomePhabricator

rack/setup/install db11[26-38].eqiad.wmnet
Open, Stalled, HighPublic

Description

This task will track the racking/setup/installation of 13 new db hosts ordered for eqiad.

These hosts will replace db1061-db1073.

Racking Proposal: Please see comments below for racking discussion. We'll need to determine where to best place these considering what they are replacing - T211613#4812709

db1126:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1127:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1128:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1129:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1130:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1131:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1132:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1133:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1134:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1135:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1136:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1137:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

db1138:

  • - receive in system on procurement task T205068
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/c/operations/dns/+/508354
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - change to staged in netbox
  • - handoff for service implementation
  • - change from staged to active in netbox

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 478829 merged by Marostegui:
[operations/puppet@production] mariadb: Install db11[26-38] new DB hosts

https://gerrit.wikimedia.org/r/478829

RobH assigned this task to Cmjohnson.Dec 11 2018, 4:33 PM
Cmjohnson moved this task from Backlog to Racking Tasks on the ops-eqiad board.Dec 11 2018, 6:36 PM
Marostegui updated the task description. (Show Details)Jan 13 2019, 8:30 PM

@Cmjohnson you've got any rough ETA for these?
Thanks!

Not until after the all hands. I will move it up on the list.

Cmjohnson updated the task description. (Show Details)Jan 30 2019, 10:58 PM

@Cmjohnson I can take care of the installations once you've done the RAID and added DNS and pxeboot entries with the MACs :-)

Change 490054 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Change 490054 merged by Marostegui:
[operations/dns@master] Adding mgmt dns for db11[26-38]

https://gerrit.wikimedia.org/r/490054

Marostegui updated the task description. (Show Details)Feb 12 2019, 3:31 PM
Marostegui raised the priority of this task from Normal to High.Apr 18 2019, 6:28 PM

I have increased the priority cause s4 master is having memory errors again and needs to be replaced as soon as we can

Cmjohnson updated the task description. (Show Details)Apr 25 2019, 3:53 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:04 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:15 PM
Cmjohnson updated the task description. (Show Details)Apr 25 2019, 4:34 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 2:50 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 3:22 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 4:01 PM
Cmjohnson updated the task description. (Show Details)May 2 2019, 5:42 PM
Cmjohnson reassigned this task from Cmjohnson to RobH.

@RobH all the servers are racked and on-site work has been completed. Some are off and some are in a state that just needs to rebooted.

Thanks Chris! Once @RobH has added the production DNS entries I can take over and install them myself :-)

Change 508354 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting production dns entries for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508354

Change 508354 merged by RobH:
[operations/dns@master] setting production dns entries for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508354

RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Marostegui.

All set!

Change 508355 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] mac addresses for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508355

Change 508355 merged by RobH:
[operations/puppet@production] mac addresses for db11[26-38].eqiad.wmnet

https://gerrit.wikimedia.org/r/508355

Thanks, now that ^ has been merged I will take over
Note: db1127 is still not present on the netboot.cfg because it is not accessible yet via idrac so the MAC cannot be retrieve. @Cmjohnson is on that. Once that is done it can be re-added
I will start the install of all the hosts except db1127

Thanks @Cmjohnson and @RobH

Marostegui updated the task description. (Show Details)May 6 2019, 3:53 PM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1126.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070511_marostegui_118294.log.

Completed auto-reimage of hosts:

['db1126.eqiad.wmnet']

and were ALL successful.

db1126 installed correctly:

root@db1126:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 5:29 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1128.eqiad.wmnet', 'db1129.eqiad.wmnet', 'db1130.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070531_marostegui_123474.log.

Completed auto-reimage of hosts:

['db1130.eqiad.wmnet']

Of which those FAILED:

['db1130.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1130.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070548_marostegui_129310.log.

db1128 and db1129 installed correctly:

root@db1128:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
root@db1129:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 5:50 AM

Change 508493 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Add DHCP lease for db1127

https://gerrit.wikimedia.org/r/508493

Change 508493 merged by Marostegui:
[operations/puppet@production] install_server: Add DHCP lease for db1127

https://gerrit.wikimedia.org/r/508493

Completed auto-reimage of hosts:

['db1130.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1127.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070605_marostegui_134280.log.

db1130 has been installed correctly:

root@db1130:~#  free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:06 AM

Completed auto-reimage of hosts:

['db1127.eqiad.wmnet']

and were ALL successful.

@RobH @Cmjohnson I have seen that the idrac for db1127 was working already so I have grabbed the MAC for the NIC and added the DHCP entry for it. So there is no need for you both to do it anymore as it installed correctly

root@db1127:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:23 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070623_marostegui_139335.log.

db1131 installed correctly:

root@db1131:~#  free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 6:47 AM

Completed auto-reimage of hosts:

['db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

Of which those FAILED:

['db1132.eqiad.wmnet', 'db1133.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070650_marostegui_146684.log.

Completed auto-reimage of hosts:

['db1131.eqiad.wmnet', 'db1132.eqiad.wmnet']

and were ALL successful.

I am troubleshooting db1133's RAID, which is OFFLINE due to several disks being OFFLINE

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1134.eqiad.wmnet', 'db1135.eqiad.wmnet', 'db1136.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070718_marostegui_155235.log.

Completed auto-reimage of hosts:

['db1136.eqiad.wmnet', 'db1134.eqiad.wmnet', 'db1135.eqiad.wmnet']

and were ALL successful.

db1132 installed correctly:

root@db1132:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           1         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 56%, Taken 31 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1134 installed correctly:

root@db1134:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 8%, Taken 3 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1135 installed correctly:

root@db1135:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1136 installed correctly:

root@db1136:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 11%, Taken 5 min.
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 7:39 AM

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070739_marostegui_163174.log.

I am troubleshooting db1133's RAID, which is OFFLINE due to several disks being OFFLINE

The RAID is now back to ONLINE and Optimal. Going to run an installation again.

db1133 had another issue:
On reboot to go for an install and while connected on the idrac this is what I get:

Unified Server Configurator does not support console redirection.

I have updated the documentation for R440 hosts: https://wikitech.wikimedia.org/w/index.php?title=Platform-specific_documentation%2FDell_PowerEdge_RN10&type=revision&diff=1825341&oldid=1769901

And now db1133 on reboot:

FW could not sync up config/prop changes for some of the VD's/PD's
Press any key to continue, or 'C' to load the configuration utility.

Going for C doesn't do anything, so I think need need onsite help. @Cmjohnson can you destroy and recreate the RAID from scratch?
Thank you

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1137.eqiad.wmnet', 'db1138.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905070836_marostegui_174999.log.

Completed auto-reimage of hosts:

['db1138.eqiad.wmnet', 'db1137.eqiad.wmnet']

and were ALL successful.

db1137 installed correctly:

root@db1137:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv

db1138 installed correctly:

root@db1138:~# free -g ; megacli -LDInfo -Lall -aALL ; df -hT /srv
              total        used        free      shared  buff/cache   available
Mem:            502           0         501           0           0         499
Swap:             7           0           7


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 4.364 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 4.364 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  5.2G  4.4T   1% /srv
Marostegui updated the task description. (Show Details)May 7 2019, 8:59 AM

The only pending host to install is db1133 which is having issues and we need on-site help from @Cmjohnson (T211613#5163570) - I have already pinged him on IRC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905071454_marostegui_16155.log.

@Cmjohnson re-created the RAID on site, but it is still showing up as degraded, so this host might need further troubleshooting. Not a big priority now as the rest of the hosts installed correctly and we have plenty of things to do, we can get back to this once Chris has time for it.

Completed auto-reimage of hosts:

['db1133.eqiad.wmnet']

Of which those FAILED:

['db1133.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905090917_marostegui_103846.log.

Completed auto-reimage of hosts:

['db1133.eqiad.wmnet']

Of which those FAILED:

['db1133.eqiad.wmnet']
RobH updated the task description. (Show Details)May 9 2019, 3:59 PM
RobH updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)May 9 2019, 4:06 PM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Wed, May 22, 5:24 AM
Marostegui changed the task status from Open to Stalled.Tue, Jun 4, 5:10 PM