Page MenuHomePhabricator

ORES uwsgi logs in logstash are useless
Closed, ResolvedPublic

Description

So if you go to ORES dashboard on logstash, it's basically list of requests coming there. It took me so long to come with the idea of adding '-200' to remove 200 responses and then added '-404' and some other to get basic meaning of the logs. Still there is lots of nonsense regarding respawning web workers, etc. and you have to filter out so many words until you get some meaning out of it.
This task is done when we have a "Fatal monitor" dashboard for ores

Event Timeline

The reason that everything is INFO is that our uwsgi encoder is designed to do so: https://github.com/wikimedia/puppet/blob/production/modules/service/manifests/uwsgi.pp#L166

It's not easily fixable as uwsgi doesn't pass around log level: https://github.com/unbit/uwsgi/blob/3149df02ed443131c54ea6afb29fcbb0ed4d1139/core/logging.c#L1860

I'm basically thinking of writing a shiny support for logstash, that'll be fun. Only thing before moving forward: Does k8s solve this issue? if k8s can send everything in stderr to logstash, all work here would be useless. @akosiaris do you know more about this?

The reason that everything is INFO is that our uwsgi encoder is designed to do so: https://github.com/wikimedia/puppet/blob/production/modules/service/manifests/uwsgi.pp#L166

Yes, given how we did it it's expected. We should be deprecating this approach and not rely on uwsgi for logging or at the very least find a way to inform it of the log level (see below for an approach)

It's not easily fixable as uwsgi doesn't pass around log level: https://github.com/unbit/uwsgi/blob/3149df02ed443131c54ea6afb29fcbb0ed4d1139/core/logging.c#L1860

It's not up to uwsgi to pass around loglevel but per https://uwsgi-docs.readthedocs.io/en/latest/LogFormat.html#user-defined-logvars it might be doable

I'm basically thinking of writing a shiny support for logstash, that'll be fun. Only thing before moving forward: Does k8s solve this issue? if k8s can send everything in stderr to logstash, all work here would be useless. @akosiaris do you know more about this?

Partly (the application needs to cooperate). So the work we will be doing this quarter is about forwarding logs from applications running on kubernetes to logstash. The applications will be logging to stdout/stderr, in order to not break commonly expected working kubernetes patterns like kubectl logs <pod_name> and WMF infrastructure will be collecting those logs and forwarding them. I expect this to be declared ready for use at the end of the quarter, possibly a bit earlier.

However, note that kubernetes can not know the log level of an entry. It is something that only the application knows and it's the only one that should be setting that.

awight subscribed.

Moving from review->active, apologies in advance if this is incorrect.

Oops, I see this is tagged in a patch for review.

Change 466716 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] ores: Add logstash config

https://gerrit.wikimedia.org/r/466716

Change 466716 merged by Alexandros Kosiaris:
[operations/puppet@production] ores: Add logstash config

https://gerrit.wikimedia.org/r/466716

Change 466857 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/services/ores/deploy@master] Start using logstash

https://gerrit.wikimedia.org/r/466857

Change 466857 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Start using logstash

https://gerrit.wikimedia.org/r/466857

Mentioned in SAL (#wikimedia-operations) [2018-10-18T17:33:00Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256 (duration: 23m 48s)

Change 470827 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] ores: Change logstash port from GELF to json lines

https://gerrit.wikimedia.org/r/470827

Change 470827 merged by Dzahn:
[operations/puppet@production] ores: Change logstash port from GELF to json lines

https://gerrit.wikimedia.org/r/470827

Mentioned in SAL (#wikimedia-operations) [2018-10-31T20:06:27Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@70ba14b]: Upgrade to celery4 and flask 0.12.4, logstash fixes: T181546 T181630 T168921 T205256 T169586 T208258 T178441

Mentioned in SAL (#wikimedia-operations) [2018-10-31T20:27:56Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@70ba14b]: Upgrade to celery4 and flask 0.12.4, logstash fixes: T181546 T181630 T168921 T205256 T169586 T208258 T178441 (duration: 21m 29s)