Page MenuHomePhabricator

Change log routing to ELK cluster to use rsyslog->kafka rather than talking directly to the ELK cluster
Closed, ResolvedPublic

Description

The json_lines input we are using is slated to be removed.

Event Timeline

I think there are two sets of config that would need to change. One is managed in ::striker::uwsgi where the uwsgi service itself is configured to log via json_lines formatted messages to the logstash
server. The other set is in the Striker app itself at https://github.com/wikimedia/labs-striker/blob/master/striker/settings.py#L77-L86.

If all the wiring is in place for the stderr->journald->rsyslog->kafka->elk path (wow, that's a lot of hops) then I think it might just be a matter of adjusting puppet config to change both sets of log events to flow that way.

@colewhite has written preliminary docs on configuring Python's logger to send the appropriate log data to journald which can then be sent on to rsyslog->kafka->ELK: https://wikitech.wikimedia.org/wiki/Logstash/Interface#Python_implementation

Change 497987 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[labs/striker@master] Add CEE log formatter and handler

https://gerrit.wikimedia.org/r/497987

@colewhite I have things figured out on the application side such that structured log events with the "@cee:" cookie stream on stderr. That is caught by the owning uwsgi process. Now I'm wondering what the best config on the uwsgi side is. Default uwsgi logging (no logger= stanzas added) passes the log messages along to the uwsgi process' stderr. When running as a systemd unit these then end up as log events in journald. In my testing enrionment things end up looking like this:

$ sudo journalctl -u uwsgi-striker --no-pager -n 2
-- Logs begin at Mon 2019-03-18 01:17:02 UTC, end at Thu 2019-03-21 04:56:17 UTC. --
Mar 21 04:55:17 vagrant uwsgi-striker[17742]: @cee: {"stack_info": null, "level": "DEBUG", "@timestamp": "2019-03-21T04:55:17.493Z", "type": "striker", "logger_name": "django.db.backends", "host": "vagrant", "path": "/vagrant/srv/striker/.venv/lib/python3.5/site-packages/django/db/backends/utils.py", "sql": "None", "@version": "1", "params": "(2,)", "message": "(0.000) None; args=(2,)", "tags": [], "duration": 0.00029468536376953125}
Mar 21 04:55:17 vagrant uwsgi-striker[17742]: @cee: {"stack_info": null, "level": "DEBUG", "@timestamp": "2019-03-21T04:55:17.495Z", "type": "striker", "logger_name": "django.db.backends", "host": "vagrant", "path": "/vagrant/srv/striker/.venv/lib/python3.5/site-packages/django/db/backends/utils.py", "sql": "None", "@version": "1", "params": "(2, True)", "message": "(0.000) None; args=(2, True)", "tags": [], "duration": 0.0002396106719970703}

What I'm not sure of at this point is if you would prefer this and then add config as needed to journald to replicate to rsyslog, or if instead it would be better/cleaner/whatever to configure uwsgi to write directly to rsyslog itself? Thoughts?

As I understand it, journald is already wired up to copy to rsyslog. The only change needed to get these logs onto Kafka is to whitelist the application in the lookup_table_output.json.

If the application name is unclear, it can be customized in the systemd unit

Both writing to journald and writing to /dev/log are supported. As for preference, I don't think there is one.

Change 498214 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] striker: let uwsgi container and app logs flow to stdout/stderr

https://gerrit.wikimedia.org/r/498214

Change 497987 merged by jenkins-bot:
[labs/striker@master] Add CEE log formatter and handler

https://gerrit.wikimedia.org/r/497987

Change 498258 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[labs/striker/deploy@master] Upgrade to Django 2.1 and other small improvments

https://gerrit.wikimedia.org/r/498258

Change 498258 merged by jenkins-bot:
[labs/striker/deploy@master] Upgrade to Django 2.1 and other small improvments

https://gerrit.wikimedia.org/r/498258

Mentioned in SAL (#wikimedia-operations) [2019-03-22T00:35:56Z] <bd808@deploy1001> Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s)

Change 498214 merged by Cwhite:
[operations/puppet@production] striker: let uwsgi container and app logs flow to stdout/stderr

https://gerrit.wikimedia.org/r/498214

Change 498516 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] service::uwsgi: Allow instances to disable logging config

https://gerrit.wikimedia.org/r/498516

Change 498516 merged by Cwhite:
[operations/puppet@production] service::uwsgi: Allow instances to disable logging config

https://gerrit.wikimedia.org/r/498516

bd808 claimed this task.

It works!