People love logstash and are using, that's awesome! However, there are a number of issues relating to Logstash that need some TLC to bring it up to a production-grade service.
Some will require SRE help.
People love logstash and are using, that's awesome! However, there are a number of issues relating to Logstash that need some TLC to bring it up to a production-grade service.
Some will require SRE help.
@Gage has been thinking about some of these issues. I think he started talking to @mark about getting some nicer hardware for the logstash servers.
The work I've been doing towards T76759: Deploy Monolog logging configuration for WMF production will help some with the SPOF problem by randomly choosing a logstash instance to send the log events to for a given MW request. It also moves us away from relying on parsing the log2udp packets relayed from fluorine which should help make things a bit more reliable.
Closing this kind of vague task as resolved. The ELK cluster is doing much better with the new hardware. We have more things to work on but we can track them in their own tickets.