Upon @faidon's suggestion, we should configure netconsole on at least upload@esams to log kernel messages to a central host, to ensure that we don't miss kernel messages potentially emitted at crash time but before the hosts manage to write them to disk.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T238305 Servers freezing across the caching cluster | |||
Resolved | • ema | T242579 Setup netconsole on upload@esams hosts |
Event Timeline
In case it is helpful: we can reuse the centrallog hosts in codfw/eqiad. For site-local netconsole instead we'll need to setup local syslog collectors anyways (on ganeti VMs) for network devices syslog, so we could piggyback on those.
Change 565982 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: configure netconsole on cp3061
Change 565982 merged by Ema:
[operations/puppet@production] cache: configure netconsole on cp3061
Change 566041 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] profile::netconsole: set local_ip to ipaddress by default
Change 566046 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable netconsole on all upload@esams
Change 566041 merged by Ema:
[operations/puppet@production] profile::netconsole: use ipaddress and interface_primary
Change 566046 merged by Ema:
[operations/puppet@production] cache: enable netconsole on all upload@esams
Change 566063 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] netconsole:: rename to netconsole::client
Change 566064 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] netconsole: add netconsole::server
Change 566063 merged by Ema:
[operations/puppet@production] netconsole:: rename to netconsole::client
Change 566288 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add netconsole server to esams ganeti
Change 566064 merged by Ema:
[operations/puppet@production] netconsole: add netconsole::server
Change 566288 merged by Ema:
[operations/puppet@production] netconsole: add server to esams ganeti
This is now done in prod. All upload@esams nodes are sending their kernel messages to a central host. See journalctl -u netconsole on ganeti3002.