Page MenuHomePhabricator

Improve logstash
Closed, ResolvedPublic

Description

People love logstash and are using, that's awesome! However, there are a number of issues relating to Logstash that need some TLC to bring it up to a production-grade service.

Some will require SRE help.

Event Timeline

demon raised the priority of this task from to Needs Triage.
demon updated the task description. (Show Details)
demon moved this task to Backlog on the MediaWiki-Core-Team board.
demon changed Security from none to None.
demon added subscribers: demon, bd808.

@Gage has been thinking about some of these issues. I think he started talking to @mark about getting some nicer hardware for the logstash servers.

The work I've been doing towards T76759: Deploy Monolog logging configuration for WMF production will help some with the SPOF problem by randomly choosing a logstash instance to send the log events to for a given MW request. It also moves us away from relying on parsing the log2udp packets relayed from fluorine which should help make things a bit more reliable.

bd808 claimed this task.

Closing this kind of vague task as resolved. The ELK cluster is doing much better with the new hardware. We have more things to work on but we can track them in their own tickets.