Page MenuHomePhabricator

eventlogging syslog message not properly recognized by logstash
Closed, ResolvedPublic

Description

I have spotted on https://logstash-beta.wmflabs.org/ a few events from deployment-eventlogging04 that have the host field set to "%{logsource}". Found them while searching for 'EventBus'.

The raw message seems to be:

<12>Dec 8 20:54:40 deployment-eventlogging04 eventlogging-service-eventbus[10 (MainThread) 400 POST /v1/events (10.68.16.127) 7.18ms

Looking at Logstash issue https://github.com/elastic/logstash/issues/2552 , that suggests the syslog-input plugin requires the program name followed by a column. So the message would need to have

eventlogging-service-eventbus:

Pure speculation.

Raw json in logstash:

{
  "_index": "logstash-2015.12.08",
  "_type": "syslog",
  "_id": "AVGDX-1ha1EjumVdkmeG",
  "_score": null,
  "_source": {
    "message": "<12>Dec  8 20:54:40 deployment-eventlogging04 eventlogging-service-eventbus[10 (MainThread) 400 POST /v1/events (10.68.16.127) 7.18ms",
    "@version": "1",
    "@timestamp": "2015-12-08T20:54:40.050Z",
    "type": "syslog",
    "host": "%{logsource}",
    "tags": [
      "_grokparsefailure_sysloginput",
      "syslog",
      "es",
      "normalized_message_untrimmed"
    ],
    "priority": 0,
    "severity": 0,
    "facility": 0,
    "facility_label": "kernel",
    "severity_label": "Emergency",
    "level": "EMERGENCY",
    "normalized_message": "<12>Dec  8 20:54:40 deployment-eventlogging04 eventlogging-service-eventbus[10 (MainThread) 400 POST /v1/events (10.68.16.127) 7.18ms"
  },
  "sort": [
    1449608080050
  ]
}

Details

Related Gerrit Patches:

Event Timeline

hashar created this task.Dec 8 2015, 9:19 PM
hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added a subscriber: hashar.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 8 2015, 9:19 PM

Hmm, I was working on syslog this week, and it is possible this is a bad message while I was developing.

When I look now, the log lines look like

Dec 11 14:49:09 deployment-eventlogging04 eventlogging-service-eventbus[10063]: (MainThread) 400 POST /v1/events (10.68.16.127) 8.64ms

Is this sufficient? This has the process name followed by the pid in brackets followed by a colon. Should I remove the bracketed pid?

I have no idea honestly :( Based on the upstream bug ( https://github.com/elastic/logstash/issues/2552 ) seems the column should be sufficient for the Logstash input filter to properly identify the message.

Searching for "eventlogging-service-eventbus on https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default still shows incorrect host fields. But I guess whatever patch you have isn't deployed on beta cluster yet :-}

Hm, no it is on beta...

Ottomata added a comment.EditedDec 14 2015, 9:41 PM

@hashar, I'm happy to change the log line to whatever format makes sense, but I'm not sure what it should be from your linked issue. In /var/log/eventlogging/eventlogging-service-eventbus.log on deployment-eventlogging04, the log lines look like:

Dec 14 21:33:14 deployment-eventlogging04 eventlogging-service-eventbus[23828]: (MainThread) 400 POST /v1/events (10.68.16.127) 5.90ms

Seems that message is received as:

<12>Dec 14 21:33:14 deployment-eventlogging04 eventlogging-service-eventbus[23 (MainThread) 400 POST /v1/events (10.68.16.127) 5.90ms

Looks like the software name (38 characters with column) is stripped down to 32 characters:

$ echo -n 'eventlogging-service-eventbus[23'|wc -c
      32

Maybe it is a hard limit in the Logstash syslog input plugin or it is stripped when being emitted/relayed somehow.

The LogStash syslog plugin probably do not respect the syslog RFCs :( They are https://tools.ietf.org/html/rfc3164 and https://tools.ietf.org/html/rfc5424

The pattern seems to be "<%{POSINT:priority}>%{SYSLOGLINE}" with SYSLOGLINE being defined via:

https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/linux-syslog

Notably, it expects the program to be followed by a column: %{SYSLOGHOST:logsource}+(?: %{SYSLOGPROG}:|)

SYSLOGPROG being: %{PROG:program}(?:\[%{POSINT:pid}\])?

Not sure how that mess is of any help though :-(

Change 259271 had a related patch set uploaded (by Ottomata):
Increase size of programname field in remote syslog template

https://gerrit.wikimedia.org/r/259271

The above change is live in beta and applied on deployment-eventlogging04

Theerree we go:

{
  "_index": "logstash-2015.12.15",
  "_type": "syslog",
  "_id": "AVGmtnTKa1EjumVdFTBo",
  "_score": null,
  "_source": {
    "message": "(MainThread) 201 POST /v1/events (10.68.23.204) 2.78ms",
    "@version": "1",
    "@timestamp": "2015-12-15T17:35:53.736Z",
    "type": "syslog",
    "host": "deployment-eventlogging04",
    "priority": 14,
    "timestamp8601": "2015-12-15T17:35:53.736268+00:00",
    "logsource": "deployment-eventlogging04",
    "program": "eventlogging-service-eventbus",
    "pid": "32682",
    "severity": 6,
    "facility": 1,
    "timestamp": "2015-12-15T17:35:53.736268+00:00",
    "facility_label": "user-level",
    "severity_label": "Informational",
    "level": "INFO",
    "tags": [
      "syslog",
      "es",
      "normalized_message_untrimmed"
    ],
    "normalized_message": "(MainThread) 201 POST /v1/events (10.68.23.204) 2.78ms"
  },
  "sort": [
    1450200953736
  ]
}

Change 259271 merged by Ottomata:
Increase size of programname field in remote syslog template

https://gerrit.wikimedia.org/r/259271

@Ottomata congratulations and figuring out rsyslog was trimming the program name when emitting the message! Looking at logstash the bulk of host:"%{logsource}" messages are now gone for EventBus.

I noticed a few remaining oddities though not related to EventBus itself:

@timestampMessage
2015-12-15T19:22:41.502Z<13>2015-12-15T19:22:41.502360+00:00 php: last message repeated 9 times
2015-12-15T19:21:19.595Z<13>2015-12-15T19:21:19.595026+00:00 php: last message repeated 15 times
2015-12-15T19:20:19.568Z<13>2015-12-15T19:20:19.567856+00:00 php: last message repeated 3 times

And a few hours later, messages related to puppet-agent with different PID. The look like:

<29>2015-12-15T22:54:16.141771+00:00  puppet-agent[1234]: last message repeated 2 times

Eleven of them sent between 2015-12-15T22:52:55.24 UTC and 2015-12-15T23:14:46 UTC. Maybe rsyslog did not get reload entirely or something goes wrong with the last message repeated n times kind of messages.

hashar closed this task as Resolved.Dec 16 2015, 10:47 AM
hashar assigned this task to Ottomata.

Looking at the rsyslog conf with salt -v '*' cmd.run 'grep syslogtag /etc/rsyslog.d/30-remote-syslog.conf' all hosts have the proper rsyslog config but logstash2 , varnish caches and deployment-mx (probably expected).

I am assuming some of the hosts took a while to have the change propagated / rsyslog to be reloaded.

Kudos @Ottomata !

Restricted Application added a project: Analytics. · View Herald TranscriptAug 19 2019, 2:28 PM