Page MenuHomePhabricator

ensure Gitlab logs end up in logstash
Closed, ResolvedPublic

Description

This task is just like T321759 for VRTS but for gitlab.


Similar to the logs from the miscweb* VMs in T216090 we should also ensure gitlab webserver logs are being sent into logstash.

The process consisted of 2 steps:

This could be copied for gitlab as well.

(Same for other services owned by serviceops-collab if they are not there yet)

Eventually we should make a dashboard that combines them (T321758).

Event Timeline

LSobanski triaged this task as Medium priority.Nov 23 2022, 7:51 PM

So it looks like gitlab logs are already configured to be included in logstash, someone beat us to it in 2021! They're configured here (logs added to rsyslog, and in the lookup_table_output file). The paths appear to be correct.

It doesn't look like the logs ever make it to logstash though. Searching for the hostname, the "program" name that rsyslog attaches, or even just a plaintext search for logstash comes up with nothing.

On the gitlab hosts, we see a few errors relating to rsyslogd, but they all look to be transient (suspect transient because the action is resumed in the last line), for example:

1Feb 1 03:30:15 gitlab1004 rsyslogd: unexpected GnuTLS error -53 - this could be caused by a broken connection. GnuTLS reports: Error in the push function. [v8.2102.0 try https://www.rsyslog.com/e/2078 ]
2Feb 1 03:30:15 gitlab1004 rsyslogd: omfwd: TCPSendBuf error -2078, destruct TCP Connection to centrallog1001.eqiad.wmnet:6514 [v8.2102.0 try https://www.rsyslog.com/e/2078 ]
3Feb 1 03:30:15 gitlab1004 rsyslogd: action 'fwd_centrallog1001.eqiad.wmnet:6514' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
4Feb 1 03:30:15 gitlab1004 rsyslogd: cannot connect to centrallog1001.eqiad.wmnet:6514: Connection refused [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
5Feb 1 03:30:15 gitlab1004 rsyslogd: action 'fwd_centrallog1001.eqiad.wmnet:6514' suspended (module 'builtin:omfwd'), next retry is Wed Feb 1 03:30:45 2023, retry nbr 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
6Feb 1 03:30:15 gitlab1004 rsyslogd: cannot connect to centrallog1001.eqiad.wmnet:6514: Connection refused [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
7Feb 1 03:30:15 gitlab1004 rsyslogd: action 'fwd_centrallog1001.eqiad.wmnet:6514' suspended (module 'builtin:omfwd'), retry 1. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
8Feb 1 03:30:15 gitlab1004 rsyslogd: cannot connect to centrallog1001.eqiad.wmnet:6514: Connection refused [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
9Feb 1 03:30:54 gitlab1004 rsyslogd: action 'fwd_centrallog1001.eqiad.wmnet:6514' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]

I had a look on the centrallog hosts in eqiad and can't see anything obvious in the logs there. Not sure where else to look really, so going to try pair with someone later to debug this a bit further.

It seems logstash integration has been implemented and tracked in T274462 already. Somehow it stopped working.

The change https://gerrit.wikimedia.org/r/c/operations/puppet/+/705019 included some filters/mutate rules.

Maybe the logging format/fields changed in some of the last GitLab versions. I'm currently trying to find something about that in logs and release notes.

eoghan added a subscriber: colewhite.

The logs are in logstash, just hard to find.

https://logstash.wikimedia.org/goto/82d8dd92e36776ce581aeef953ff6891

Thanks to @colewhite for helping me to understand this, but because these logs are ecs formatted, they're in a different index. That can be found by switching to ecs-* here:

image.png (486×444 px, 38 KB)