We should upgrade BIOS and IDRAC firmware in esams, these are crashing frequently (T238305).
This task was expanded on 2020-02-10 to include eqiad cache systems.
BIOS Version: 2.4.8
iDRAC Firmware Version: 4.00.00.00
== eqiad hosts ==
upload:
[x] cp1076
[x] cp1078
[x] cp1080
[x] cp1082
[x] cp1084
[x] cp1086
[x] cp1088
[x] cp1090
text:
[x] cp1075
[x] cp1077
[x] cp1079
[x] cp1081
[x] cp1083
[x] cp1085
[x] cp1087
[x] cp1089
== esams hosts ==
Please upgrade cache_upload hosts with precedence:
[x] cp3051.esams.wmnet
[x] cp3053.esams.wmnet
[x] cp3055.esams.wmnet
[ ] cp3057.esams.wmnet
[ ] cp3059.esams.wmnet
[ ] cp3061.esams.wmnet
[ ] cp3063.esams.wmnet
[ ] cp3065.esams.wmnet
And there's also the cache_text:
[x] cp3050.esams.wmnet
[x] cp3052.esams.wmnet
[x] cp3054.esams.wmnet
[ ] cp3056.esams.wmnet
[ ] cp3058.esams.wmnet
[ ] cp3060.esams.wmnet
[ ] cp3062.esams.wmnet
[ ] cp3064.esams.wmnet
Please coordinate depooling/pooling of the servers with the #wikimedia-traffic channel.
== Update Checklist ==
CP system BIOs update directions:
[] - ensure host can be offline with #traffic
[] - shutdown host via OS commands, this will automatically depool the host from pybal
[] - update firmware via mgmt interface
[] - boot host back into OS, puppet run should clear all icinga checks green. (May need to manually refire puppet checks to speed things up.)
[] - Green in icinga, then run 'pool' from the command line of the host
Checks to run between system updates & changing pool state:
[] - Check the graphs on https://grafana.wikimedia.org/d/kHk7W6OZz/ats-cluster-view?orgId=1&from=now-6h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-layer=tls&var-cluster=upload
[] - Check pool state via cumin host: confctl select 'name=cp3.*' get|sort