People have started counting on Logstash our ELK stack to monitor logs and analyse anomalies. Alas, episodes like T158602: RESTBase logs disappeared from logstash show us that having services pointed to a single logstash host can result in the complete loss of logs. In order to mitigate that, let's set up these hosts behind LVS so that even when a part of the stack malfunctions we can ensure that all of the logs are lost.
Description
Description
Related Objects
Related Objects
Event Timeline
Comment Actions
The elasticsearch part is probably trivial (we already do it for search), but Logstash itself might not be. There's a couple of ingestion routes and they aren't all HTTP--the log4j appender comes to mind immediately.
Not a bad idea, but needs some thinking :)
Comment Actions
One potential problem (or maybe not, i don't entirely know how LVS works) is that GELF can be sent as compressed, chunked UDP packets. If there is any opportunity for these UDP packets to show up at different machines then the message will be lost.
Comment Actions
FYI I did have a patch to add logstash to LVS (with source address hashing, to address @EBernhardson concern) at https://gerrit.wikimedia.org/r/#/c/324371 if someone wants to take over.