Page MenuHomePhabricator

Handle unknown stats in rsyslog_exporter
Closed, ResolvedPublic

Description

This shows up in the logs on central syslog hosts:

Nov 22 10:49:31 wezen rsyslog_exporter[21742]: 2018/11/22 10:49:31 error handling stats line: unknown pstat type: 0, line was: 2018-11-22T10:49:31.413068+00:00 wezen rsyslogd-pstats: { "name": "imudp(w0)", "origin": "imudp", "called.recvmmsg": 158, "called.recvmsg": 0, "msgs.received": 81 }

See also https://github.com/filippog/rsyslog_exporter/issues/1

Event Timeline

jijiki subscribed.

This is polluting our messages on mw* servers as well:

Oct 20 06:20:19 mw1333 rsyslog_exporter[44310]: 2019/10/20 06:20:19 error handling stats line: unknown pstat type: 0, line was: 2019-10-20T06:20:19.674293+00:00 mw1333 rsyslogd-pstats: { "name": "imudp(w0)
", "origin": "imudp", "called.recvmmsg": 5443193, "called.recvmsg": 0, "msgs.received": 2724803 }

Any news on this by any chance? On this random host I was checking today it taking up 1/3rd of syslog.

 netbox-dev2001  0 ~$ grep -c rsyslog_exporter /var/log/syslog
6365
 netbox-dev2001  0 ~$ wc -l /var/log/syslog
18953 /var/log/syslog

Mentioned in SAL (#wikimedia-operations) [2020-10-08T17:16:32Z] <shdubsh> install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - T210137

colewhite triaged this task as Low priority.

Found a new upstream and have deployed it to netbox-dev2001 and centrallog1001 to run for a few days. If all checks out, we'll roll it to the rest of the fleet.

Change 634112 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus: ensure new prometheus-rsyslog-exporter version

https://gerrit.wikimedia.org/r/634112

Change 634112 merged by Cwhite:
[operations/puppet@production] prometheus: ensure new prometheus-rsyslog-exporter version

https://gerrit.wikimedia.org/r/634112

Updated prometheus-rsyslog-exporter deployed to the fleet. If the log message comes up again please let us know.

elukey subscribed.

Hi! While investigating a problem on kubernetes nodes I found out that in all clusters where a kubelet runs we have the following repeated over and over:

Jul 26 13:38:45 kubernetes1001 rsyslog_exporter[23109]: 2021/07/26 13:38:45 error handling stats line: unknown pstat type: 0, line was: 2021-07-26T13:38:45.414590+00:00 kubernetes1001 rsyslogd-pstats: { "name": "mmkubernetes(https:\/\/kubemaster.svc.eqiad.wmnet:6443)", "origin": "mmkubernetes", "recordseen": 940266, "namespacemetadatasuccess": 9, "namespacemetadatanotfound": 0, "namespacemetadatabusy": 0, "namespacemetadataerror": 0, "podmetadatasuccess": 36, "podmetadatanotfound": 0, "podmetadatabusy": 0, "podmetadataerror": 0 }

Is it possible that we have another occurrence of the same problem? On all the nodes that I checked I found:

ii  prometheus-rsyslog-exporter 0.0.0+git20201008-1 amd64        Export rsyslog metrics to Prometheus

Is it possible that we have another occurrence of the same problem?

Yes, this look like the same problem we ran into with imudp. The exporter cannot determine the type of the stats line. The exporter will need to be extended to handle it.

colewhite subscribed.
fgiunchedi raised the priority of this task from Low to Medium.Aug 6 2021, 7:27 AM

This is significantly spammy e.g. on Bullseye hosts too (see below), nudging into o11y Q1

Aug  6 07:26:19 thanos-fe2001 rsyslog_exporter[1744917]: 2021/08/06 07:26:19 error handling stats line: unknown pstat type: 0, line was: 2021-08-06T07:26:19.224409+00:00 thanos-fe2001 rsyslogd-pstats: { "name": "TCP-centrallog1001.eqiad.wmnet-6514", "origin": "omfwd", "bytes.sent": 22926913011 }
Aug  6 07:26:19 thanos-fe2001 rsyslog_exporter[1744917]: 2021/08/06 07:26:19 error handling stats line: unknown pstat type: 0, line was: 2021-08-06T07:26:19.224471+00:00 thanos-fe2001 rsyslogd-pstats: { "name": "TCP-centrallog2001.codfw.wmnet-6514", "origin": "omfwd", "bytes.sent": 22918949585 }

Ok I have a working patch to parse omfwd messages at https://phabricator.wikimedia.org/P17091, pending package and deployment

Ditto for mmkubernetes, code patch at https://phabricator.wikimedia.org/P17094, also pending package + deployment

Ok I have a working patch to parse omfwd messages at https://phabricator.wikimedia.org/P17091, pending package and deployment

Ditto for mmkubernetes, code patch at https://phabricator.wikimedia.org/P17094, also pending package + deployment

Out of curiosity, how come these are phabricator patches rather than normal changesets in gerrit using our copy of prometheus-rsyslog-exporter?

Ok I have a working patch to parse omfwd messages at https://phabricator.wikimedia.org/P17091, pending package and deployment

Ditto for mmkubernetes, code patch at https://phabricator.wikimedia.org/P17094, also pending package + deployment

Out of curiosity, how come these are phabricator patches rather than normal changesets in gerrit using our copy of prometheus-rsyslog-exporter?

These are just phastes on phabricator with the actual commit, I'll send out reviews to our package (which then will have the patch double-encoded as a debian patch)

Change 715457 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/debs/prometheus-rsyslog-exporter@master] Add patches to handle mmkubernetes and omfwd stats

https://gerrit.wikimedia.org/r/715457

Change 715457 merged by Filippo Giunchedi:

[operations/debs/prometheus-rsyslog-exporter@master] Add patches to handle mmkubernetes and omfwd stats

https://gerrit.wikimedia.org/r/715457

Change 719231 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/debs/prometheus-rsyslog-exporter@master] Try restarting rsyslog on package installation

https://gerrit.wikimedia.org/r/719231

Change 719231 merged by Filippo Giunchedi:

[operations/debs/prometheus-rsyslog-exporter@master] Try restarting rsyslog on package installation

https://gerrit.wikimedia.org/r/719231

Mentioned in SAL (#wikimedia-operations) [2021-09-08T07:45:25Z] <godog> start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - T210137

Mentioned in SAL (#wikimedia-operations) [2021-09-08T09:09:12Z] <godog> start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - T210137

Mentioned in SAL (#wikimedia-operations) [2021-09-08T09:29:39Z] <godog> start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - T210137

Mentioned in SAL (#wikimedia-operations) [2021-09-08T09:38:21Z] <godog> start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - T210137

fgiunchedi claimed this task.

Rollout is completed, please reopen if sth is amiss