Page MenuHomePhabricator

Logging for GitLab
Open, MediumPublic

Description

What are the centralized logging mechanisms for GitLab and what are the network requirements needed for them?

Event Timeline

greg triaged this task as Medium priority.Feb 24 2021, 5:38 PM
brennen added a subscriber: colewhite.

cc: @colewhite for awareness, per some discussion between RelEng and Observability today.

Latest state that I am aware of:

  • Gerrit currently does not use logstash. As we use Gerrit as a guide, we currently do not know what should into Logstash. Will define in a later pahse
  • It would be good to have gitlab logs copied to another host, so that we could troubleshoot if the gitlab host has issues

Latest state that I am aware of:

  • Gerrit currently does not use logstash. As we use Gerrit as a guide, we currently do not know what should into Logstash. Will define in a later pahse

Minor clarification. Gerrit does currently use logstash, logs are shipped there and there is a Gerrit dashboard. It is, as far as I know, a rarely used dashboard. There is one primary Gerrit machine and when investigating log messages we tend to look on the machine directly and that is sufficient.

  • It would be good to have gitlab logs copied to another host, so that we could troubleshoot if the gitlab host has issues

+1

If by shipped you guys mean ingested by Logstash into an ElasticSearch instance, then yes, this was the plan.

Besides system logs (syslog) which I assume are ingested by default, these are the logs that Gitlab produces (managed by runit's svlogd):

svlogd /var/log/gitlab/gitaly
svlogd /var/log/gitlab/sidekiq
svlogd /var/log/gitlab/gitlab-workhorse
svlogd /var/log/gitlab/gitlab-rails

svlogd -tt /var/log/gitlab/nginx
svlogd -tt /var/log/gitlab/postgresql

svlogd -tt /var/log/gitlab/prometheus
svlogd -tt /var/log/gitlab/grafana
svlogd -tt /var/log/gitlab/logrotate
svlogd -tt /var/log/gitlab/redis
svlogd -tt /var/log/gitlab/alertmanager
svlogd -tt /var/log/gitlab/puma
svlogd -tt /var/log/gitlab/postgres-exporter
svlogd -tt /var/log/gitlab/gitlab-exporter
svlogd -tt /var/log/gitlab/node-exporter
svlogd -tt /var/log/gitlab/redis-exporter

I grouped them into three categories from having the highest value being ingested into ES, to trace/debug information. By default we suggest ingesting only the first group, but ultimately it's your decision, nginx/postgresql logs can be quote valueable troubleshooting issues.

Tailing and ingesting logs from files from disk is possible. We do this for Gerrit currently.

It looks like the logs can be emitted in a structured format. This is preferred and will greatly simplify the logging onboarding process.

It looks like the logs can be emitted in a structured format. This is preferred and will greatly simplify the logging onboarding process.

This is going to be the case, yes: Beginning in Omnibus GitLab 12.0, the JSON format is enabled by default for all services that support it.

Just to confirm that we are on the same page here: Logstash agents are installed and configured by Puppet, we're only proving the list of logs to be ingested and configuring output format if needed, right?

Just to confirm that we are on the same page here: Logstash agents are installed and configured by Puppet, we're only proving the list of logs to be ingested and configuring output format if needed, right?

Yes, Puppet will configure rsyslog to ship these logs to the logging pipeline. Once these files are available, we can proceed.

Just discussing this with @Jelto; we've now got a GitLab instance running on gitlab1001, and files are being produced in /var/log/gitlab.

I think we'd like to see logs in the following directories in the logging pipeline:

/var/log/gitlab/gitaly
/var/log/gitlab/sidekiq
/var/log/gitlab/gitlab-workhorse
/var/log/gitlab/gitlab-rails
/var/log/gitlab/nginx
/var/log/gitlab/postgresql
/var/log/gitlab/redis

There's some non-logs files in these directories and not all are JSON-formatted.

cwhite@gitlab1001:/var/log/gitlab$ sudo ls -w 1 gitaly sidekiq gitlab-workhorse gitlab-rails nginx postgresql redis | grep -v \.gz
gitaly:
@4000000060c11dd933624624.s
config
current              # json logs
gitaly_hooks.log     # empty
gitaly_ruby_json.log # json logs
lock
state

gitlab-rails:
api_json.log                                    # json logs
application_json.log                            # json logs
application.log                                 # plain logs
audit_json.log                                  # json logs
auth.log                                        # empty
exceptions_json.log                             # empty
git_json.log                                    # empty
gitlab-rails-db-migrate-2021-06-08-20-00-25.log # empty
graphql_json.log                                # json logs
grpc.log                                        # empty
importer.log                                    # empty
production_json.log                             # json logs
production.log                                  # multiline plain logs
service_measurement.log                         # empty
sidekiq_client.log                              # empty

gitlab-workhorse:
@4000000060c11e2a114c0444.s
config
current # json logs
lock
state

nginx:
access.log        # empty
config
current           # plain logs
error.log         # empty
gitlab_access.log # plain logs
gitlab_error.log  # plain logs
lock

postgresql:
@4000000060c11ddf32a1dc54.s
config
current # empty
lock
state

redis:
@4000000060c11dd32b455abc.s
config
current # plain logs
lock
state

sidekiq:
@4000000060c11e240b36e72c.s
config
current # json logs
lock
state

Some transformations are needed to get the JSON logs ECS-compatible. I can handle building that transformation step.

The "plain" logs are easy to ingest, although most appear to be lacking severity info. We'll probably just assign a sane default. For webserver access and error, we can supply some grok filters.

Multi-line logs are difficult to ingest intelligibly.

Some questions:

  1. What is the "current" file? Is this file different than gitaly_hooks or gitaly_ruby_json?
  2. Is it safe to assume the empty files with _json will be JSON format and that we want them?
  3. What should we do with the other empty files?
  4. Is it safe to ignore the "@" files, "config", "lock", and "state"?

Thanks for digging into the layout of this so thoroughly.

What is the "current" file? Is this file different than gitaly_hooks or gitaly_ruby_json?

Per the docs on logs for runit-managed services, current should be the current log file for the service.

gitaly_ruby_json.log is for gitaly-ruby, which I'm guessing isn't managed by runit. In practice, it looks like it's just a health check every 10 seconds or so; I'm not sure how useful that's going to be. I can't find much on gitaly_hooks.log.

Is it safe to ignore the "@" files, "config", "lock", and "state"?

Yeah, it should be. As I understand it, config, lock, and state are just metadata for runit. The @….s files are TAI64N compressed rotated logs. svlogd(8) has some detail on that.

Is it safe to assume the empty files with _json will be JSON format and that we want them?

I think generally yeah.

What should we do with the other empty files?

For nginx, it looks like gitlab_access.log and gitlab_error.log are the only ones getting updates, so my guess is that access.log and error.log can be ignored. I'm not clear about postgres/current or some of the ones under gitlab-rails/, but we can dig in a bit on those.

@brennen Thank you for the info! I've got a mostly-working set of patches nearing readiness for review.

Is it possible to configure the nginx component to emit ECS-compatible access logs natively? Something like:

log_format ecs_170 escape=json
  '@cee: {'
    '"timestamp":"$time_iso8601",'
    '"client.ip":"$remote_addr",'
    '"user.name":"$remote_user",'
    '"url.path":"$request_uri",'
    '"url.domain":"$host",'
    '"host.name":"$hostname",'
    '"http.request.method":"$request_method",'
    '"http.request.headers.referer":"$http_referer",'
    '"http.response.status_code":"$status",'
    '"http.response.body.bytes":"$body_bytes_sent",'
    '"user_agent.original":"$http_user_agent",'
    '"event.category":["network","web"],'
    '"event.dataset":"nginx.access",'
    '"event.kind":"event",'
    '"event.type":["access","connection"],'
    '"service.type":"nginx",'
    '"ecs.version":"1.7.0"'
  '}';

Is it possible to configure the nginx component to emit ECS-compatible access logs natively?

Good question; I'll check. My guess is this depends more on nginx itself than on the GitLab stuff.

Ok, it looks like we just need to set nginx['log_format'] to that value in /etc/gitlab/gitlab.rb.

Working on a patch; it's a bit fiddly because the omnibus config wants to just jam the value of nginx[log_format] inside a single-quoted string, and I'm not sure if it allows for setting the escape, but there's probably some workaround.

Change 705715 had a related patch set uploaded (by Brennen Bearnes; author: Brennen Bearnes):

[operations/gitlab-ansible@master] logging: format nginx access logs as JSON

https://gerrit.wikimedia.org/r/705715

Change 705715 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] logging: format nginx access logs as JSON

https://gerrit.wikimedia.org/r/705715

Mentioned in SAL (#wikimedia-releng) [2021-07-21T19:06:41Z] <brennen> gitlab1001: running ansible to deploy nginx logging and status changes (T274462, T275170)

Change 706036 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/gitlab-ansible@master] logging: fix typo

https://gerrit.wikimedia.org/r/706036

Change 706036 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] logging: fix typo

https://gerrit.wikimedia.org/r/706036

Mentioned in SAL (#wikimedia-releng) [2021-07-21T21:06:16Z] <brennen> gitlab1001: running ansible for logging typo fix (T274462)

Change 705019 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] logstash: add gitlab ECS transformations

https://gerrit.wikimedia.org/r/705019