We should upgrade BIOS and IDRAC firmware in esams, these are crashing frequently (T238305).
This task was expanded on 2020-02-10 to include eqiad cache systems.
BIOS Version: 2.4.8
iDRAC Firmware Version: 4.00.00.00
eqiad hosts
upload:
- cp1076
- cp1078
- cp1080
- cp1082
- cp1084
- cp1086
- cp1088
- cp1090
text:
- cp1075
- cp1077
- cp1079
- cp1081
- cp1083
- cp1085
- cp1087
- cp1089
esams hosts
Please upgrade cache_upload hosts with precedence:
- cp3051.esams.wmnet
- cp3053.esams.wmnet
- cp3055.esams.wmnet
- cp3057.esams.wmnet
- cp3059.esams.wmnet
- cp3061.esams.wmnet
- cp3063.esams.wmnet
- cp3065.esams.wmnet
And there's also the cache_text:
- cp3050.esams.wmnet
- cp3052.esams.wmnet
- cp3054.esams.wmnet
- cp3056.esams.wmnet
- cp3058.esams.wmnet
- cp3060.esams.wmnet
- cp3062.esams.wmnet
- cp3064.esams.wmnet
Please coordinate depooling/pooling of the servers with the #wikimedia-traffic channel.
Update Checklist
CP system BIOs update directions:
- - ensure host can be offline with Traffic
- - shutdown host via OS commands, this will automatically depool the host from pybal
- - update firmware via mgmt interface
- - boot host back into OS, puppet run should clear all icinga checks green. (May need to manually refire puppet checks to speed things up.)
- - Green in icinga, then run 'pool' from the command line of the host
Checks to run between system updates & changing pool state:
- - Check the graphs on https://grafana.wikimedia.org/d/kHk7W6OZz/ats-cluster-view?orgId=1&from=now-6h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-layer=tls&var-cluster=upload
- - Check pool state via cumin host: confctl select 'name=cp3.*' get|sort