Page MenuHomePhabricator

Upgrade firmware on graphite1004 if upgrade available.
Closed, ResolvedPublic

Description

graphite1004 is from 2018 and probably has still it's pristine, original firmware. My suggestion would be to get DC ops to upgrade all firmware to recent versions.

In T297265, the host randomly freezes with kernel 5.10.46 which was downgraded from 5.10.70 preemptively to mitigate a bug affecting the mx servers.

But in general; given that these hosts ran fine before with 5.10.70, we can also easily revert to that. The downgrade towards .46 was done out of caution for the conntrack bug which hit mx2001, but in comparison to the two crashes of 1004/2003 with .46 the conntrack one is still a hypothetical, while the other two are real...

We'd like to see if a firmware upgrade helps mitigate the conntrack bug. Graphite1004 is currently the backup host.

Event Timeline

@colewhite we can update the f/w but this will require the server to be out of production for about 30 minutes. We can do this almost anytime, let me know when you would like to proceed.

I think it just needs to be downtimed in icinga for the maintenance window. Being the backup host, I think you can proceed when you're ready.

graphite1004 f/w has been updated except the idrac. its version is too old and the system will not allow an update to the current or previous version. Hopefully, this helps, if not please re-open and I can investigate for any h/w issues.