Page MenuHomePhabricator

Enhance Gerrit logs ingested by Logstash
Closed, ResolvedPublic

Description

T141324 got us the Gerrit logs send to Logstash and I reported on that task that the messages were poorly formatted in Logstash but that got missed. Hence a new task to just focus on the formatting aspect.

Gerrit does emit json logs described at https://gerrit.wikimedia.org/r/Documentation/logs.html . They can be looked at on gerrit1001 in

  • /var/log/gerrit/error_log.json
  • /var/log/gerrit/sshd_log.json

That is ingested using rsyslog and in puppet we have:

modules/profile/manifests/gerrit.pp
# Ship Gerrit built-in logs to ELK
rsyslog::input::file { 'gerrit-json':
    path => '/var/log/gerrit/*_log.json',
}

An entry from the json file looks like:

{
  "@timestamp": "2021-02-12T15:50:36.946+0000",
  "source_host": "208.80.154.136",
  "message": "Logging context is not empty: LoggingContext{tags=null, forceLogging=null, performanceLogging=null, performanceLogRecords=MutablePerformanceLogRecords{performanceLogRecords=[]}}",
  "file": "LoggingContextAwareRunnable.java",
  "line_number": "94",
  "class": "com.google.gerrit.server.logging.LoggingContextAwareRunnable",
  "method": "run",
  "logger_name": "com.google.gerrit.server.logging.LoggingContextAwareRunnable",
  "mdc": {},
  "ndc": "",
  "level": "WARN",
  "thread_name": "Index-Batch-1",
  "@version": 2
}

And in logstash https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-syslog-2021.02.12?id=xw7vlncBjr5R1RLC347u :

{
  "_index": "logstash-syslog-2021.02.12",
  "_type": "_doc",
  "_id": "xw7vlncBjr5R1RLC347u",
  "_score": 1,
  "_source": {
    "facility": "local0",
    "file": "LoggingContextAwareRunnable.java",
    "logger_name": "com.google.gerrit.server.logging.LoggingContextAwareRunnable",
    "message": "Logging context is not empty: LoggingContext{tags=null, forceLogging=null, performanceLogging=null, performanceLogRecords=MutablePerformanceLogRecords{performanceLogRecords=[]}}",
    "tags": ["input-kafka-rsyslog-shipper", "rsyslog-shipper", "kafka", "es", "syslog", "es", "normalized_message_untrimmed"],
    "thread_name": "Index-Batch-1",
    "program": "input-file-gerrit-json",
    "@timestamp": "2021-02-12T15:50:36.946Z",
    "timestamp": "2021-02-12T15:50:36.947111+00:00",
    "msg": "{\"@timestamp\":\"2021-02-12T15:50:36.946+0000\",\"source_host\":\"208.80.154.136\",\"message\":\"Logging context is not empty: LoggingContext{tags=null, forceLogging=null, performanceLogging=null, performanceLogRecords=MutablePerformanceLogRecords{performanceLogRecords=[]}}\",\"file\":\"LoggingContextAwareRunnable.java\",\"line_number\":\"94\",\"class\":\"com.google.gerrit.server.logging.LoggingContextAwareRunnable\",\"method\":\"run\",\"logger_name\":\"com.google.gerrit.server.logging.LoggingContextAwareRunnable\",\"mdc\":{},\"ndc\":\"\",\"level\":\"WARN\",\"thread_name\":\"Index-Batch-1\",\"@version\":2}",
    "normalized_message": "Logging context is not empty: LoggingContext{tags=null, forceLogging=null, performanceLogging=null, performanceLogRecords=MutablePerformanceLogRecords{performanceLogRecords=[]}}",
    "mdc": {},
    "logsource": "gerrit1001",
    "line_number": "94",
    "ndc": "",
    "@version": 2,
    "source_host": "208.80.154.136",
    "class": "com.google.gerrit.server.logging.LoggingContextAwareRunnable",
    "method": "run",
    "level": [
      "WARNING", 
      "%{SEVERITY_LABEL}"
    ],
    "type": "syslog",
    "severity": "notice",
    "host": "gerrit1001"
  },
  "fields": {
    "@timestamp": ["2021-02-12T15:50:36.946Z"]
  } 
}

One audity is that there are two levels shown:

"level": [
  "WARNING", 
  "%{SEVERITY_LABEL}"
],

That does not play well in Kibana. Maybe because level: WARN is not recognized?

In logstash we have:

programinput-file-gerrit-json
typesyslog

As I understand it that comes from the rsyslog input system. It would be great to have the program field to use gerrit instead. Or maybe there is another field that can fit that purpose.

Event Timeline

It has been suggested to use ECS. We run Gerrit 3.2 and 3.3 will change the timestamp format. From http://gerrit-documentation.storage.googleapis.com/Documentation/3.3.1/logs.html:

For all timestamps the format [yyyy-MM-dd’T’HH:mm:ss,SSSXXX] is used. This format is both ISO 8601 and RFC3339 compatible.

While currently our Gerrit 3.2 emits:

"@timestamp": "2021-02-12T15:50:36.946+0000",

That might have an impact for ECS adoption when we upgrade.

colewhite moved this task from Inbox to In progress on the observability board.
colewhite subscribed.

ECS migration task: T234565

colewhite triaged this task as Medium priority.Feb 12 2021, 7:52 PM

That is being done via https://gerrit.wikimedia.org/r/c/operations/puppet/+/663876

I guess we can mark this task as a dupe of the ECS migration task so?

The log formatting issue has been a long standing one and we were never using Kibana to dig in because of it. All the issues referred in this task have no been addressed, we also use Gerrit built-in json logging system instead of crafting our own with a 3rd party java class.

Finally, @colewhite when a step ahead and added all the glue to support Elastic Common Schema https://doc.wikimedia.org/ecs/ . I believe our events are no state-of-the art.

I have updated the Kibana dashboard https://logstash.wikimedia.org/app/dashboards#/view/AW1f-0k0ZKA 7RpirlnKV , it now uses ECS aware visualizations. It certainly needs some more tuning and probably a few extra dashboards for specific kind of events (such as all the ssh daemon log spam).

Thanks to @colewhite for jumping in and going the extra mile to support ECS! Much appreciated.