Investigating a reimage issue I've noticed something strange in the output of the IPMI command chassis bootparam get 5, hence I've done an audit of the whole fleet and found a worrying situation.
Hosts with broken remote IPMI
- bast4002.mgmt.ulsfo.wmnet (FIXED, remote IPMI disabled)
- cp2010.mgmt.codfw.wmnet (FIXED, reset password)
- cp2021.mgmt.codfw.wmnet (FIXED, reset password)
- cp2022.mgmt.codfw.wmnet (FIXED, reset password)
- cp4024.mgmt.ulsfo.wmnet (FIXED, reset password)
- dns4001.mgmt.ulsfo.wmnet (FIXED, remote IPMI disabled)
- dns4002.mgmt.ulsfo.wmnet (FIXED, remote IPMI disabled)
- es1019.mgmt.eqiad.wmnet (TO BE FIXED, had many failures in the past: T187530 T155691 T167121)
- lawrencium.mgmt.eqiad.wmnet (IGNORING, to be decomm'ed: T191360)
- mw2251.mgmt.codfw.wmnet (FIXED, reset password)
- scb1004.mgmt.eqiad.wmnet (FIXED, racadm racreset)
Hosts with Sleep Button and Console overridden
This hosts have:
Boot parameter data: 0000020000 - Lock Out Sleep Button - BIOS verbosity : Request console redirection be enabled
As opposed to the default:
Boot parameter data: 0000000000 - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
List of affected hosts:
graphite2002.mgmt.codfw.wmnet rdb1006.mgmt.eqiad.wmnet scb2003.mgmt.codfw.wmnet scb2004.mgmt.codfw.wmnet
Are you ok to set the overridden bit?
Hosts with Boot Flat, Sleep Button and Console overridden
This hosts have:
Boot parameter data: 8000020000 - Boot Flag Valid - Lock Out Sleep Button - BIOS verbosity : Request console redirection be enabled
As opposed to the default:
Boot parameter data: 0000000000 - Boot Flag Invalid - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
List of affected hosts:
bast4001.mgmt.ulsfo.wmnet conf2001.mgmt.codfw.wmnet db1061.mgmt.eqiad.wmnet db1062.mgmt.eqiad.wmnet db1069.mgmt.eqiad.wmnet labstore1004.mgmt.eqiad.wmnet labstore1005.mgmt.eqiad.wmnet labtestvirt2001.mgmt.codfw.wmnet neodymium.mgmt.eqiad.wmnet phab1001.mgmt.eqiad.wmnet puppetmaster1001.mgmt.eqiad.wmnet sodium.mgmt.eqiad.wmnet
Are you ok to set the overridden bit?
Force PXE (FIXED)
150 hosts had the Boot Device Selector overridden to Force PXE at the next reboot. This is the most worrying, in particular because most of them host stateful services. We agreed that there is no use case to have any host in PXE mode given our current infrastructure configuration, hence I've already fixed it resetting the bit to not override the default boot order.
aqs1006.mgmt.eqiad.wmnet conf1004.mgmt.eqiad.wmnet conf1006.mgmt.eqiad.wmnet db1087.mgmt.eqiad.wmnet db1090.mgmt.eqiad.wmnet db2033.mgmt.codfw.wmnet db2034.mgmt.codfw.wmnet db2036.mgmt.codfw.wmnet db2037.mgmt.codfw.wmnet db2041.mgmt.codfw.wmnet db2043.mgmt.codfw.wmnet db2044.mgmt.codfw.wmnet db2046.mgmt.codfw.wmnet db2050.mgmt.codfw.wmnet db2052.mgmt.codfw.wmnet db2053.mgmt.codfw.wmnet db2054.mgmt.codfw.wmnet db2069.mgmt.codfw.wmnet db2070.mgmt.codfw.wmnet dbstore2001.mgmt.codfw.wmnet dbstore2002.mgmt.codfw.wmnet elastic1032.mgmt.eqiad.wmnet elastic1033.mgmt.eqiad.wmnet elastic1034.mgmt.eqiad.wmnet elastic1035.mgmt.eqiad.wmnet elastic1036.mgmt.eqiad.wmnet elastic1037.mgmt.eqiad.wmnet elastic1038.mgmt.eqiad.wmnet elastic1039.mgmt.eqiad.wmnet elastic1040.mgmt.eqiad.wmnet elastic1041.mgmt.eqiad.wmnet elastic1042.mgmt.eqiad.wmnet elastic1043.mgmt.eqiad.wmnet elastic1044.mgmt.eqiad.wmnet elastic1045.mgmt.eqiad.wmnet elastic1046.mgmt.eqiad.wmnet elastic1047.mgmt.eqiad.wmnet elastic1048.mgmt.eqiad.wmnet elastic1049.mgmt.eqiad.wmnet elastic1050.mgmt.eqiad.wmnet elastic1051.mgmt.eqiad.wmnet elastic1052.mgmt.eqiad.wmnet elastic2018.mgmt.codfw.wmnet elastic2020.mgmt.codfw.wmnet elastic2025.mgmt.codfw.wmnet elastic2026.mgmt.codfw.wmnet elastic2027.mgmt.codfw.wmnet elastic2028.mgmt.codfw.wmnet elastic2029.mgmt.codfw.wmnet elastic2030.mgmt.codfw.wmnet elastic2031.mgmt.codfw.wmnet elastic2032.mgmt.codfw.wmnet elastic2033.mgmt.codfw.wmnet elastic2034.mgmt.codfw.wmnet elastic2035.mgmt.codfw.wmnet elastic2036.mgmt.codfw.wmnet labcontrol1003.mgmt.eqiad.wmnet labcontrol1004.mgmt.eqiad.wmnet labtestcontrol2003.mgmt.codfw.wmnet labtestmetal2001.mgmt.codfw.wmnet labtestneutron2002.mgmt.codfw.wmnet labtestservices2002.mgmt.codfw.wmnet labtestvirt2003.mgmt.codfw.wmnet lvs1010.mgmt.eqiad.wmnet lvs1011.mgmt.eqiad.wmnet lvs1012.mgmt.eqiad.wmnet lvs2004.mgmt.codfw.wmnet lvs2005.mgmt.codfw.wmnet lvs2006.mgmt.codfw.wmnet mc1019.mgmt.eqiad.wmnet mc1020.mgmt.eqiad.wmnet mc1021.mgmt.eqiad.wmnet -- iLO was not accepting changes, it worked after a reset mc1022.mgmt.eqiad.wmnet mc1023.mgmt.eqiad.wmnet mc1024.mgmt.eqiad.wmnet mc1025.mgmt.eqiad.wmnet mc1026.mgmt.eqiad.wmnet mc1027.mgmt.eqiad.wmnet mc1028.mgmt.eqiad.wmnet mc1029.mgmt.eqiad.wmnet mc1030.mgmt.eqiad.wmnet mc1031.mgmt.eqiad.wmnet mc1032.mgmt.eqiad.wmnet mc1033.mgmt.eqiad.wmnet mc1034.mgmt.eqiad.wmnet mc1035.mgmt.eqiad.wmnet mc1036.mgmt.eqiad.wmnet mc2036.mgmt.codfw.wmnet ms-be1019.mgmt.eqiad.wmnet ms-be1020.mgmt.eqiad.wmnet ms-be1021.mgmt.eqiad.wmnet ms-be1022.mgmt.eqiad.wmnet ms-be1023.mgmt.eqiad.wmnet ms-be1024.mgmt.eqiad.wmnet ms-be1025.mgmt.eqiad.wmnet ms-be1026.mgmt.eqiad.wmnet ms-be1027.mgmt.eqiad.wmnet ms-be1028.mgmt.eqiad.wmnet ms-be1029.mgmt.eqiad.wmnet ms-be1030.mgmt.eqiad.wmnet ms-be1031.mgmt.eqiad.wmnet ms-be1032.mgmt.eqiad.wmnet ms-be1033.mgmt.eqiad.wmnet ms-be1034.mgmt.eqiad.wmnet ms-be1035.mgmt.eqiad.wmnet ms-be1036.mgmt.eqiad.wmnet ms-be1037.mgmt.eqiad.wmnet ms-be1038.mgmt.eqiad.wmnet ms-be1039.mgmt.eqiad.wmnet ms-be2017.mgmt.codfw.wmnet ms-be2018.mgmt.codfw.wmnet ms-be2019.mgmt.codfw.wmnet ms-be2020.mgmt.codfw.wmnet ms-be2023.mgmt.codfw.wmnet ms-be2025.mgmt.codfw.wmnet ms-be2026.mgmt.codfw.wmnet ms-be2027.mgmt.codfw.wmnet ms-be2028.mgmt.codfw.wmnet ms-be2029.mgmt.codfw.wmnet ms-be2030.mgmt.codfw.wmnet ms-be2031.mgmt.codfw.wmnet ms-be2032.mgmt.codfw.wmnet ms-be2033.mgmt.codfw.wmnet ms-be2034.mgmt.codfw.wmnet ms-be2035.mgmt.codfw.wmnet ms-be2036.mgmt.codfw.wmnet ms-be2037.mgmt.codfw.wmnet ms-be2038.mgmt.codfw.wmnet ms-be2039.mgmt.codfw.wmnet relforge1001.mgmt.eqiad.wmnet relforge1002.mgmt.eqiad.wmnet restbase1010.mgmt.eqiad.wmnet restbase1011.mgmt.eqiad.wmnet restbase1012.mgmt.eqiad.wmnet restbase1013.mgmt.eqiad.wmnet restbase1014.mgmt.eqiad.wmnet restbase1015.mgmt.eqiad.wmnet restbase2001.mgmt.codfw.wmnet restbase2002.mgmt.codfw.wmnet restbase2003.mgmt.codfw.wmnet restbase2004.mgmt.codfw.wmnet restbase2005.mgmt.codfw.wmnet restbase2006.mgmt.codfw.wmnet restbase2007.mgmt.codfw.wmnet restbase2008.mgmt.codfw.wmnet restbase2009.mgmt.codfw.wmnet stat1006.mgmt.eqiad.wmnet wasat.mgmt.codfw.wmnet wdqs1003.mgmt.eqiad.wmnet wdqs2003.mgmt.codfw.wmnet