Page MenuHomePhabricator

Setup netconsole on upload@esams hosts
Closed, ResolvedPublic

Description

Upon @faidon's suggestion, we should configure netconsole on at least upload@esams to log kernel messages to a central host, to ensure that we don't miss kernel messages potentially emitted at crash time but before the hosts manage to write them to disk.

Details

Related Gerrit Patches:
operations/puppet : productionnetconsole: add server to esams ganeti
operations/puppet : productionnetconsole: add netconsole::server
operations/puppet : productionnetconsole:: rename to netconsole::client
operations/puppet : productioncache: enable netconsole on all upload@esams
operations/puppet : productionprofile::netconsole: use ipaddress and interface_primary
operations/puppet : productioncache: configure netconsole on cp3061

Event Timeline

ema triaged this task as Medium priority.Jan 13 2020, 10:33 AM
ema created this task.

In case it is helpful: we can reuse the centrallog hosts in codfw/eqiad. For site-local netconsole instead we'll need to setup local syslog collectors anyways (on ganeti VMs) for network devices syslog, so we could piggyback on those.

ema moved this task from Triage to Hardware on the Traffic board.Jan 13 2020, 1:12 PM

Change 565982 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: configure netconsole on cp3061

https://gerrit.wikimedia.org/r/565982

Change 565982 merged by Ema:
[operations/puppet@production] cache: configure netconsole on cp3061

https://gerrit.wikimedia.org/r/565982

Change 566041 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] profile::netconsole: set local_ip to ipaddress by default

https://gerrit.wikimedia.org/r/566041

Change 566046 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable netconsole on all upload@esams

https://gerrit.wikimedia.org/r/566046

Change 566041 merged by Ema:
[operations/puppet@production] profile::netconsole: use ipaddress and interface_primary

https://gerrit.wikimedia.org/r/566041

Change 566046 merged by Ema:
[operations/puppet@production] cache: enable netconsole on all upload@esams

https://gerrit.wikimedia.org/r/566046

Change 566063 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] netconsole:: rename to netconsole::client

https://gerrit.wikimedia.org/r/566063

Change 566064 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] netconsole: add netconsole::server

https://gerrit.wikimedia.org/r/566064

Change 566063 merged by Ema:
[operations/puppet@production] netconsole:: rename to netconsole::client

https://gerrit.wikimedia.org/r/566063

Change 566288 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add netconsole server to esams ganeti

https://gerrit.wikimedia.org/r/566288

Change 566064 merged by Ema:
[operations/puppet@production] netconsole: add netconsole::server

https://gerrit.wikimedia.org/r/566064

Change 566288 merged by Ema:
[operations/puppet@production] netconsole: add server to esams ganeti

https://gerrit.wikimedia.org/r/566288

ema closed this task as Resolved.Jan 21 2020, 3:31 PM
ema claimed this task.

This is now done in prod. All upload@esams nodes are sending their kernel messages to a central host. See journalctl -u netconsole on ganeti3002.