Page MenuHomePhabricator

kernel error detector: have a way to ignore certain messages
Closed, ResolvedPublic

Description

Some kernel warning and error messages need to be ignored.

Examples:

We could update the detector to read a config file with regexes, or lines, to ignore.

Event Timeline

Another possibility (maybe on top of) would be to be able to acknowledge the errors, for example read a timestamp from a file before which the errors will be ignored (ex. if an issue might happen again, but the current event is not relevant anymore).

Another possibility (maybe on top of) would be to be able to acknowledge the errors, for example read a timestamp from a file before which the errors will be ignored (ex. if an issue might happen again, but the current event is not relevant anymore).

That would be nice to have. But I would not know how to implement this off the top of my head. How would you implement it?

That would be nice to have. But I would not know how to implement this off the top of my head. How would you implement it?

Maybe something like cookbook wmcs.ack_kernel_errors --hostname <> creates the file on the node with the timestamp (say /etc/prometheus/kernel_errors_exporter.yaml), then the script reads that file.

Change #1115391 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] prometheus-node-kernel-messages: add logic to ignore messages

https://gerrit.wikimedia.org/r/1115391

Change #1115391 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] prometheus-node-kernel-messages: add logic to ignore messages

https://gerrit.wikimedia.org/r/1115391

Change #1116797 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] prometheus: node_kernel_messages: fix ignore regex file path

https://gerrit.wikimedia.org/r/1116797

Change #1116797 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] prometheus: node_kernel_messages: fix ignore regex file path

https://gerrit.wikimedia.org/r/1116797

Change #1116802 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] promethes: node_kernel_messages: fix another typo in source file name

https://gerrit.wikimedia.org/r/1116802

Change #1116802 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] promethes: node_kernel_messages: fix another typo in source file name

https://gerrit.wikimedia.org/r/1116802

Change #1116822 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] prometheus-node-kernel-messages.sh: don't fail if there are no matches

https://gerrit.wikimedia.org/r/1116822

Change #1116822 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] prometheus-node-kernel-messages.sh: don't fail if there are no matches

https://gerrit.wikimedia.org/r/1116822

aborrero claimed this task.