Page MenuHomePhabricator

Send error logs to logstash
Closed, ResolvedPublic

Description

I noticed that a recent server error didn't seem to appear in logstash. I'm not sure if we're failing to send errors, or if the level is not being recorded.

See also T149010#3136402.

Event Timeline

Halfak raised the priority of this task from Low to High.
Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.

When returning error responses, ores.wsgi.util.format_error summarizes as type=error class name, message=str cast. This response error handling code might be a good place to log the complete traceback as well.

Today I realized this is very important, we don't report anything outside of uwsgi logs to the logstash and I don't have access to syslog or deamon.log, Basically I can't see any errors of ORES

Hm, it seems to me the way forward here would be to include python-logstash and then add it as logging handler via the deployed [[https://phabricator.wikimedia.org/source/ores-deploy/browse/master/logging_config.yaml|logging_config.yaml]].

The library seems unmaintained :/ but beside that it's a good idea to use it, worst case, we fork and maintain it.

Ladsgroup added a subscriber: hoo.

The library is basically four files, the last commit on it was two years ago and it doesn't support python3, we basically can write it from scratch. I will do it.

Change 466716 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] ores: Add logstash config

https://gerrit.wikimedia.org/r/466716

Change 466716 merged by Alexandros Kosiaris:
[operations/puppet@production] ores: Add logstash config

https://gerrit.wikimedia.org/r/466716

Change 466857 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/services/ores/deploy@master] Start using logstash

https://gerrit.wikimedia.org/r/466857

Change 466857 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Start using logstash

https://gerrit.wikimedia.org/r/466857

Mentioned in SAL (#wikimedia-operations) [2018-10-18T17:33:00Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256 (duration: 23m 48s)

Change 470827 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] ores: Change logstash port from GELF to json lines

https://gerrit.wikimedia.org/r/470827

Change 470827 merged by Dzahn:
[operations/puppet@production] ores: Change logstash port from GELF to json lines

https://gerrit.wikimedia.org/r/470827

Mentioned in SAL (#wikimedia-operations) [2018-10-31T20:06:27Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@70ba14b]: Upgrade to celery4 and flask 0.12.4, logstash fixes: T181546 T181630 T168921 T205256 T169586 T208258 T178441

Mentioned in SAL (#wikimedia-operations) [2018-10-31T20:27:56Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@70ba14b]: Upgrade to celery4 and flask 0.12.4, logstash fixes: T181546 T181630 T168921 T205256 T169586 T208258 T178441 (duration: 21m 29s)