Page MenuHomePhabricator

audit / test / upgrade hp smartarray P840 firmware
Closed, ResolvedPublic

Description

It looks like the errors / timeouts on newer ms-be machines might be due to a missing firmware upgrade for the hardware raid controller P840. We can test an upgrade on one of the machines in codfw to begin with though and expand to other hp machines that would need the upgrade, possibly other controller models too.

HP raid firmware audit:

root@cumin1001:~# cumin 'F:manufacturer = HP' '[ -x /usr/sbin/hpssacli ] && cat /sys/class/scsi_disk/*\:1\:0\:0/device/rev /sys/class/scsi_device/*\:0\:0\:0/device/rev 2>/dev/null'

===== NODE GROUP =====
(1) db2034.codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
7.02              
7.02              
===== NODE GROUP =====
(1) db1089.eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
5.04              
5.04              
===== NODE GROUP =====
(1) cloudcontrol1004.wikimedia.org
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
HPG2              
HPG2              
HPG2              
HPG2              
===== NODE GROUP =====
(22) cloudvirt1020.eqiad.wmnet,db1082.eqiad.wmnet,ms-be[2028-2029,2031-2033,2035-2036,2038-2039].codfw.wmnet,ms-be[1
029-1039].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
4.52              
4.52              
===== NODE GROUP =====
(2) labstore[1006-1007].wikimedia.org
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
4.52              
4.52              
4.52              
4.52              
===== NODE GROUP =====
(1) cloudvirt1019.eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.60              
5.04              
6.60              
===== NODE GROUP =====
(7) ms-be[2017-2018,2020-2021,2030].codfw.wmnet,ms-be[1017,1028].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.60              
6.60              
===== NODE GROUP =====
(5) ms-be[2016,2019].codfw.wmnet,ms-be[1019-1021].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
3.00              
3.00              
===== NODE GROUP =====
(3) lvs[1010-1012].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.64              
6.64              
===== NODE GROUP =====
(10) db[2036,2038-2041].codfw.wmnet,lvs[2001,2003-2006].codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
5.42              
5.42              
===== NODE GROUP =====
(7) restbase[2007-2009].codfw.wmnet,restbase[1010-1011,1013,1015].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.06              
===== NODE GROUP =====
(2) ms-be[2023,2037].codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.06              
6.06              
===== NODE GROUP =====
(9) cloudvirt[1013-1014].eqiad.wmnet,db1092.eqiad.wmnet,labvirt1012.eqiad.wmnet,ms-be[2025,2027].codfw.wmnet,ms-be[1022-1023,1027].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
4.02              
4.02              
===== NODE GROUP =====
(1) db2060.codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.68              
6.68              
===== NODE GROUP =====
(1) ms-be2034.codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.30              
6.30              
===== NODE GROUP =====
(7) db[2035,2037,2044,2048-2049,2068].codfw.wmnet,lvs2002.codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
8.00              
8.00              
===== NODE GROUP =====
(33) db[1074-1081,1083-1088,1090-1091,1093-1095].eqiad.wmnet,labsdb[1009-1011].eqiad.wmnet,ms-be[2022,2024,2026].codfw.wmnet,ms-be[1024-1026].eqiad.wmnet,restbase[1012,1014].eqiad.wmnet,snapshot[1005-1007].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
3.56              
3.56              
===== NODE GROUP =====
(32) db[2033,2043,2045-2047,2050-2056,2058-2059,2061-2063,2065-2067,2069-2070].codfw.wmnet,dbstore2001.codfw.wmnet,labvirt[1001-1009].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
6.00              
6.00              
===== NODE GROUP =====
(3) db[2042,2057].codfw.wmnet,dbstore2002.codfw.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
8.32              
8.32              
===== NODE GROUP =====
(2) ms-be[1016,1018].eqiad.wmnet
----- OUTPUT of '[ -x /usr/sbin/h.../rev 2>/dev/null' -----
1.34              
1.34

Links to firmware downloads

Docs

https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_Documentation#RAID_controller_firmware_upgrade

Related Objects

Event Timeline

fgiunchedi renamed this task from audit / test / upgrade hp smartarray firmware to audit / test / upgrade hp smartarray P840 firmware.Aug 1 2016, 9:25 AM
fgiunchedi updated the task description. (Show Details)

The hp machines with raid controllers

root@neodymium:~# salt --out raw -C 'G@manufacturer:hp' cmd.run '[ -x /usr/sbin/hpssacli ] && hpssacli controller all show | sed "s/(sn:.*//"' | sort
Mode)'}
Mode)'}
Mode)'}
{'aqs1004.eqiad.wmnet': ''}
{'aqs1005.eqiad.wmnet': ''}
{'aqs1006.eqiad.wmnet': ''}
{'db1074.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1075.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1076.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1077.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1078.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1079.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1080.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1081.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1082.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1083.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1084.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1085.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1086.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1087.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1088.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1089.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1090.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1091.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1092.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1093.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db1094.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'db2033.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2034.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2035.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2036.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2037.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2038.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2039.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2040.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2041.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2042.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2043.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2044.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2045.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2046.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2047.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2048.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2049.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2050.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2051.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2052.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2053.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2054.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2055.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2056.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2057.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2058.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2059.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2060.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2061.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2062.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2063.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2064.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2065.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2066.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2067.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2068.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2069.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'db2070.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'dbstore2001.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'dbstore2002.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'druid1001.eqiad.wmnet': ''}
{'druid1002.eqiad.wmnet': ''}
{'druid1003.eqiad.wmnet': ''}
{'elastic1032.eqiad.wmnet': ''}
{'elastic1033.eqiad.wmnet': ''}
{'elastic1034.eqiad.wmnet': ''}
{'elastic1035.eqiad.wmnet': ''}
{'elastic1036.eqiad.wmnet': ''}
{'elastic1037.eqiad.wmnet': ''}
{'elastic1038.eqiad.wmnet': ''}
{'elastic1039.eqiad.wmnet': ''}
{'elastic1040.eqiad.wmnet': ''}
{'elastic1041.eqiad.wmnet': ''}
{'elastic1042.eqiad.wmnet': ''}
{'elastic1043.eqiad.wmnet': ''}
{'elastic1044.eqiad.wmnet': ''}
{'elastic1045.eqiad.wmnet': ''}
{'elastic1046.eqiad.wmnet': ''}
{'elastic1047.eqiad.wmnet': ''}
{'elastic2001.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2002.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2003.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2004.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2005.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2006.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2007.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2008.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2009.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2010.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2011.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2012.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2013.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2014.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2015.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2016.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2017.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2018.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2019.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2020.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2021.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2022.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2023.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'elastic2024.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'graphite1003.eqiad.wmnet': ''}
{'labmon1001.eqiad.wmnet': ''}
{'labsdb1008.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'labsdb1009.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'labsdb1010.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'labsdb1011.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'labvirt1001.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1002.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1003.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1004.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1005.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1006.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1007.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1008.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1009.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'labvirt1010.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'labvirt1011.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'labvirt1012.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'labvirt1013.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'labvirt1014.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'lvs1007.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs1008.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs1009.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs1010.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs1011.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs1012.eqiad.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2001.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2002.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2003.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2004.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2005.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'lvs2006.codfw.wmnet': '\nSmart Array P420i in Slot 0 (Embedded)'}
{'maps1001.eqiad.wmnet': ''}
{'maps1002.eqiad.wmnet': ''}
{'maps1003.eqiad.wmnet': ''}
{'maps1004.eqiad.wmnet': ''}
{'maps2001.codfw.wmnet': ''}
{'maps2002.codfw.wmnet': ''}
{'maps2003.codfw.wmnet': ''}
{'maps2004.codfw.wmnet': ''}
{'ms-be1016.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'ms-be1017.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'ms-be1018.eqiad.wmnet': '\nSmart Array P840 in Slot 1'}
{'ms-be1019.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1020.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1021.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1022.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1023.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1024.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1025.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be1026.eqiad.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2016.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2017.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2018.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2019.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2020.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2021.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2022.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2023.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2024.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2025.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2026.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'ms-be2027.codfw.wmnet': '\nSmart Array P840 in Slot 3'}
{'rdb2005.codfw.wmnet': ''}
{'rdb2006.codfw.wmnet': ''}
{'relforge1001.eqiad.wmnet': ''}
{'relforge1002.eqiad.wmnet': ''}
{'restbase1010.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase1011.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase1012.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase1013.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase1014.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase1015.eqiad.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2001.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2002.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2003.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2004.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2005.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2006.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2007.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2008.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'restbase2009.codfw.wmnet': '\nSmart Array P440ar in Slot 0 (Embedded)'}
{'snapshot1005.eqiad.wmnet': '\nSmart HBA H240ar in Slot 0 (Embedded) (RAID
{'snapshot1006.eqiad.wmnet': '\nSmart HBA H240ar in Slot 0 (Embedded) (RAID
{'snapshot1007.eqiad.wmnet': '\nSmart HBA H240ar in Slot 0 (Embedded) (RAID
{'wasat.codfw.wmnet': ''}

Mentioned in SAL [2016-08-01T10:11:01Z] <godog> reboot ms-be2027 after raid controller fw upgrade T141756

Looking at the fleet of controller models, not all are covered by the same firmware edition/version. Namely the P420i isn't included in the firmware applied for e.g. the P840 in ms-be2027 above.

So different controller/firmware combination:
H240ar, H240nr, H240, H241, H244br, P240nr, P244br, P246br, P440ar, P440, P441, P542D, P741m, P840, P840ar, and P841 is at http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7274906&swItemId=MTX_4d3778d5c6644691afc2ba751e&swEnvOid=4176#tab-history and latest is dated 4.02 (B)(28 Apr 2016)

and for 220i, P222, P420i, P420, P421, P721m, and P822 the firmware is at http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=5295169&swItemId=MTX_71dbd7bea4dc4825ab381e81e1&swEnvOid=4103#tab-history and latest is dated 7.02 (B)(7 Apr 2016)

Mentioned in SAL [2016-08-01T13:48:36Z] <godog> reboot ms-be1023 after raid controller fw upgrade T141756

fw upgrade on ms-be1023 seems to be ok, it still takes ~40s for check_hpssacli though, waiting for a bit and see if the kernel messages mentioned in T136631 come back

Mentioned in SAL [2016-08-03T14:36:48Z] <godog> reboot ms-be1022 following firmware upgrade T141756

@fgiunchedi I would like to get db1082 upgraded as Moritz mentioned in: T145533

So far I have repooled it as it has been out for a long time, but we can depool it and try to upgrade it.
Can you help me with this process?

Thank you

sure @Marostegui !

Once you have the firmware from the links above for the right controller (hpssacli controller all show) you can extract the rpm and launch hpsetup binary which will apply the upgrade and then reboot.

I have upgraded db1082:

root@db1082:~# hpssacli controller slot=1 show | grep -i firmware
   Firmware Version: 4.02
$ cumin 'db20[33-70].*' 'hpssacli controller slot=0 show | grep -i firmware'
38 hosts will be targeted:
db[2033-2070].codfw.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(7) db[2036-2042].codfw.wmnet
----- OUTPUT of 'hpssacli control...grep -i firmware' -----
   Firmware Version: 5.42
===== NODE GROUP =====
(1) db2034.codfw.wmnet
----- OUTPUT of 'hpssacli control...grep -i firmware' -----
   Firmware Version: 7.02
===== NODE GROUP =====
(25) db[2033,2043,2045-2047,2050-2059,2061-2070].codfw.wmnet
----- OUTPUT of 'hpssacli control...grep -i firmware' -----
   Firmware Version: 6.00
===== NODE GROUP =====
(1) db2060.codfw.wmnet
----- OUTPUT of 'hpssacli control...grep -i firmware' -----
   Firmware Version: 6.68
===== NODE GROUP =====
(4) db[2035,2044,2048-2049].codfw.wmnet
----- OUTPUT of 'hpssacli control...grep -i firmware' -----
   Firmware Version: 8.00

For the record, db1089 has been upgraded to the latest firmware which is now 5.04 for the P840 model.
The extracted RPM is at: db1089:/home/marostegui/hp-firmware-smartarray-ea3138d8e8-5.04-1.1

Just a FYI, the snapshot hosts 1005,6,7 have: Firmware Version: 3.56

They've been fine so far.

Mentioned in SAL (#wikimedia-operations) [2017-11-16T16:48:16Z] <godog> upgrade hpsa firmware to 6.06 on restbase2006 - T141756

Mentioned in SAL (#wikimedia-operations) [2017-12-05T21:58:21Z] <mutante> restbase1010 - upgraded HP firmware (Flashing Smart Array P440ar in Slot 0 [ 3.56 -> 6.06 ]) T141756 T178177

Mentioned in SAL (#wikimedia-operations) [2017-12-05T22:13:15Z] <mutante> restbase1010 failed at reboot with P6431 , after a cold start (power off, power on) it came back though :) (T178177 T141756)

Mentioned in SAL (#wikimedia-operations) [2018-01-08T09:53:34Z] <godog> Flashing Smart Array P840 in Slot 3 [ 4.52 -> 6.06 ] on ms-be2037 - T184390 T141756

Mentioned in SAL (#wikimedia-operations) [2018-04-23T09:13:35Z] <godog> Flashing Smart Array P840 in Slot 3 [ 4.52 -> 6.30 ] on ms-be2034 - T192721 T141756

Latest audit via cumin

root@neodymium:~# cumin 'F:manufacturer = HP' 'if [ -x /usr/sbin/hpssacli ] ; then cat /sys/class/scsi_disk/*\:1\:0\:0/device/rev; fi '
298 hosts will be targeted:
aqs[1004-1006].eqiad.wmnet,conf[1004-1006].eqiad.wmnet,db[2033-2070].codfw.wmnet,db[1074-1095].eqiad.wmnet,dbstore[2001-2002].codfw.wmnet,druid[1001-1003].eqiad.wmnet,elastic[2001-2036].codfw.wmnet,elastic[1032-1052].eqiad.wmnet,graphite10
03.eqiad.wmnet,labcontrol[1003-1004].wikimedia.org,labmon[1001-1002].eqiad.wmnet,labnet[1003-1004].eqiad.wmnet,labnodepool1002.eqiad.wmnet,labpuppetmaster[1001-1002].wikimedia.org,labsdb[1009-1011].eqiad.wmnet,labstore[1006-1007].wikimedia
.org,labtestcontrol2003.wikimedia.org,labtestmetal2001.codfw.wmnet,labtestnet2002.codfw.wmnet,labtestneutron2002.codfw.wmnet,labtestpuppetmaster2001.wikimedia.org,labtestservices[2002-2003].wikimedia.org,labtestvirt2003.codfw.wmnet,labvirt
[1001-1014].eqiad.wmnet,lvs[2001-2006].codfw.wmnet,lvs[1010-1012].eqiad.wmnet,maps[2001-2004].codfw.wmnet,maps[1001-1004].eqiad.wmnet,mc[2019-2036].codfw.wmnet,mc[1019-1036].eqiad.wmnet,ms-be[2016-2039].codfw.wmnet,ms-be[1016-1039].eqiad.w
mnet,netmon2001.wikimedia.org,oresrdb2002.codfw.wmnet,rdb[2005-2006].codfw.wmnet,relforge[1001-1002].eqiad.wmnet,restbase[2001-2009].codfw.wmnet,restbase[1010-1015].eqiad.wmnet,restbase-dev[1004-1006].eqiad.wmnet,snapshot[1005-1007].eqiad.
wmnet,stat1006.eqiad.wmnet,wasat.codfw.wmnet,wdqs2003.codfw.wmnet,wdqs1003.eqiad.wmnet,wezen.codfw.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                             ███████████████████████████████████████████████▏       |  96% (285/298) [00:02<00:00, 89.63hosts/s]
(25) db1082.eqiad.wmnet,elastic[2004,2020].codfw.wmnet,ms-be[2028-2033,2035-2036,2038-2039].codfw.wmnet,ms-be[1028-1039].eqiad.wmnet                                                        |   2% (7/298) [00:02<01:30,  3.20hosts/s]
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
4.52              
===== NODE GROUP =====
(2) labstore[1006-1007].wikimedia.org
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
4.52              
4.52              
===== NODE GROUP =====
(3) ms-be[1016-1018].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
1.34              
===== NODE GROUP =====
(7) restbase[2007-2009].codfw.wmnet,restbase[1011,1013,1015].eqiad.wmnet,snapshot1007.eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
cat: /sys/class/scsi_disk/*:1:0:0/device/rev: No such file or directory
===== NODE GROUP =====
(1) db1089.eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
5.04              
===== NODE GROUP =====
(7) db[2035,2037,2044,2048-2049,2068].codfw.wmnet,lvs2002.codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
8.00              
===== NODE GROUP =====
(1) ms-be2034.codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
6.30              
===== NODE GROUP =====
(5) ms-be[2023,2037].codfw.wmnet,restbase[2004,2006].codfw.wmnet,restbase1010.eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
6.06              
===== NODE GROUP =====
(11) labvirt[1010-1011].eqiad.wmnet,ms-be[2016-2021].codfw.wmnet,ms-be[1019-1021].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
3.00              
===== NODE GROUP =====
(1) db2060.codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
6.68              
===== NODE GROUP =====
(35) db[2033,2043,2045-2047,2050-2059,2061-2067,2069-2070].codfw.wmnet,dbstore[2001-2002].codfw.wmnet,labvirt[1001-1009].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
6.00              
===== NODE GROUP =====
(3) lvs[1010-1012].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
6.64              
===== NODE GROUP =====
(1) db2034.codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
7.02              
===== NODE GROUP =====
(9) db1092.eqiad.wmnet,labvirt[1012-1014].eqiad.wmnet,ms-be[2025,2027].codfw.wmnet,ms-be[1022-1023,1027].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
4.02              
===== NODE GROUP =====
(11) db[2036,2038-2042].codfw.wmnet,lvs[2001,2003-2006].codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
5.42              
===== NODE GROUP =====
(32) db[1074-1081,1083-1088,1090-1091,1093-1095].eqiad.wmnet,labsdb[1009-1011].eqiad.wmnet,ms-be[2022,2024,2026].codfw.wmnet,ms-be[1024-1026].eqiad.wmnet,restbase[1012,1014].eqiad.wmnet,snapshot[1005-1006].eqiad.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
3.56              
===== NODE GROUP =====
(26) elastic[2001-2003,2005-2019,2021-2024].codfw.wmnet,restbase[2001-2003,2005].codfw.wmnet
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----
2.52              
================

Mentioned in SAL (#wikimedia-operations) [2018-07-09T10:31:18Z] <godog> upgrade hp raid firmware on ms-be1017 - T141756

Mentioned in SAL (#wikimedia-operations) [2018-07-30T15:08:06Z] <godog> upgrade hp raid firmware on ms-be1028 - T141756

Mentioned in SAL (#wikimedia-operations) [2018-08-21T16:27:25Z] <godog> upgrade hp raid firmware on ms-be2020 - T141756

Mentioned in SAL (#wikimedia-operations) [2018-10-23T08:08:30Z] <godog> update hp firmware to 6.60 on ms-be2017 - T141756

Mentioned in SAL (#wikimedia-operations) [2019-07-05T11:32:17Z] <jijiki> Upgrading smartarray firmware on ms-be1021 - T141756 - T227076

Today ms-be2031 locked up as well, I'll upgrade the firmware once it is back

Slot 3 Port 1 : Smart Array P840 Controller - (4096 MB, V4.52) 14 Logical
Drive(s) - Operation Failed
 - 1719-Slot 3 Drive Array - A controller failure event occurred prior
   to this power-up.  (Previous lock up code = 0x13) Action: Install the
   latest controller firmware. If the problem persists, replace the
   controller.

Mentioned in SAL (#wikimedia-operations) [2019-07-11T12:07:56Z] <godog> ms-be2031 raid controller firmware upgrade 4.52 -> 6.88 - T141756

Mentioned in SAL (#wikimedia-operations) [2019-07-17T12:36:27Z] <godog> upgrade hp raid firmware on ms-be1 hosts - T141756

[removing swift-storage tag as none of the relevant swift nodes are still in production]

There are 31 HP servers and 1 storage array remaining (https://netbox.wikimedia.org/dcim/manufacturers/6/), excluding the Swift hosts the majority remaining are DBs. Looking at the purchase dates most should be decommissioned this year, @fgiunchedi, @Marostegui are you OK with resolving this task?

Hm, actually, that list from netbox includes servers not in the description of this task (ah, and they have manufacturer = HPE not HP) and the necessary binary is now /usr/sbin/ssacli. So the check now looks like:

mvernon@cumin2002:~$ sudo cumin "A:swift and P{F:manufacturer = HPE}" 'if [ -x /usr/sbin/ssacli ] ; then cat /sys/class/scsi_disk/*\:1\:0\:0/device/rev; fi '
15 hosts will be targeted:
ms-be[2051-2056].codfw.wmnet,ms-be[1051-1059].eqiad.wmnet
OK to proceed on 15 hosts? Enter the number of affected hosts to confirm or "q" to quit: 15
===== NODE GROUP =====                                                          
(15) ms-be[2051-2056].codfw.wmnet,ms-be[1051-1059].eqiad.wmnet                  
----- OUTPUT of 'if [ -x /usr/sbi.../device/rev; fi ' -----                     
1.98                                                                            
================                                                                
PASS |████████████████████████████████| 100% (15/15) [00:01<00:00, 14.15hosts/s]
FAIL |                                         |   0% (0/15) [00:01<?, ?hosts/s]
100.0% (15/15) success ratio (>= 100.0% threshold) for command: 'if [ -x /usr/sbi.../device/rev; fi '.
100.0% (15/15) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

It's worth noting both that the RAID controllers are P816i-a SR Gen10 not the P840 that I think this ticket was about and that these systems are all from 2019 (so due for replacement this calendar year).

There are 31 HP servers and 1 storage array remaining (https://netbox.wikimedia.org/dcim/manufacturers/6/), excluding the Swift hosts the majority remaining are DBs. Looking at the purchase dates most should be decommissioned this year, @fgiunchedi, @Marostegui are you OK with resolving this task?

+1

LSobanski claimed this task.
LSobanski removed LSobanski as the assignee of this task.