Page MenuHomePhabricator

cp3051 crashed
Open, MediumPublic

Description

This one too crashed today together with cp3055. Nothing on the console.

Event Timeline

Volans created this task.Dec 21 2019, 10:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 21 2019, 10:44 PM

Mentioned in SAL (#wikimedia-operations) [2019-12-21T22:45:08Z] <volans> powercycle cp3051 - T241306

Nothing in racadm, checked both getsel and lclog view. Nothing in syslog & co.

FYI in dmesg during the end of the boot process it logged a bunch of kvm: disabled by bios.

Volans triaged this task as Medium priority.Dec 21 2019, 11:26 PM
ema added a comment.Dec 22 2019, 10:21 AM

Thanks @Volans for taking care of this.

Nothing in racadm, checked both getsel and lclog view. Nothing in syslog & co.

Just like all other crashes tracked in T238305 :-/
Now, I know it sounds crazy, but: this is the 6th host crashing out of 8 cache_upload nodes in esams. So far none of the 8 cache_text nodes has crashed. I don't think there's too much to look at at the software configuration level, considering that in eqiad a text node has crashed (cp1077), but perhaps it's worth checking what's special about upload@esams that differentiates it from text? Something at the hardware level maybe, like parts batches, or anything special related to racking? You can tell upload@esams hosts from text because their hostname is odd: cp30(5[13579]|6[135]) vs cp30(5[02468]|6[024]).

FYI in dmesg during the end of the boot process it logged a bunch of kvm: disabled by bios.

Disabled on purpose, we don't use kvm on cache nodes.

CDanis added a subscriber: CDanis.Dec 22 2019, 3:09 PM
ema moved this task from Triage to Hardware on the Traffic board.Dec 23 2019, 9:13 AM