Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | herron | T281266 Decommission old ELK5 Logstash cluster | |||
Resolved | herron | T297239 Move logstash api-feature-usage output away from v5 cluster | |||
In Progress | colewhite | T288621 Logs and events produced by the WMF are consumed using the Elastic Common Schema by OpenSearch | |||
Resolved | herron | T288620 Document path forward and Retire remaining non-Kafka Logstash inputs |
Event Timeline
According to the logstash input type distributions graph we're down to elastisearch via gelf for non-kafka inputs.
There is some background and discussion about this migration in T225125. The TLDR from my understanding is that the implementation of json formatted elasticsearch logs over syslog has been prepared, but is currently switched off as a dependency is blocked on upgrading to ES7.
I'm tempted to try shimming these using an rsyslog listener that emulates gelf and routes these logs to the kafka logging pipeline until the longer-term/upgraded elastic config is in place.
Spent a chunk of time experimenting with this yesterday in deployment-prep, and unfortunately I don't think rsyslog specifically will do the trick.
The current logging configuration in elasticsearch is using logstash-gelf[1], and ships gelf formatted logs over udp directly to the logstash lvs. Part of the gelf protocol is compression and chunking of logs, and while I was able to ingest the gelf udp via rsyslog I have not had any success decompressing/parsing them. In theory logstash-gelf supports tcp transport which disables compression (and newer logstash-gelf versions even support kafka) but in my testing switching to tcp resulted in no logs arriving at all.
So I think it's time to look for alternatives. One alternative that looks promising at first glance is logagent. This is an apache 2 licensed log shipping agent from sematext, which supports GELF input[2] and output to kafka[3] (among others). In theory we could run this as a daemon on the elasic hosts, and configure elasticsearch to output udp gelf to the logagent on localhost, which would relay directly to kafka or the local syslog. Still need testing and validation on this, but appears to be an option so far.
[1] https://github.com/mp911de/logstash-gelf
[2] https://sematext.com/docs/logagent/input-plugin-gelf/
[3] https://sematext.com/docs/logagent/output-plugin-kafka/
Change 720110 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] wip: logagent: puppet module sketch
So far so good testing logagent. Confirmed that it can indeed ingest/parse udp GELF logs from our elasticsearch logstash-gelf config and output them json formatted to stdout. By wrapping this config in a systemd unit we should be able to pick up these logs with rsyslog and send them onward to kafka logging.
I've put together an initial sketch of the puppetization, part of which raises the question -- what is appropriate way to install npm packages like logagent and graygelf on production hosts?
After exploring the NPM approach a bit on https://gerrit.wikimedia.org/r/c/operations/puppet/+/720110/ it's clear that we would be better off to look for an alternate tool written in another language with less convoluted dependencies, and which is easier to audit and maintain in the long term.
An alternate approach that comes to mind is deploying logstash instances to GELF shipping hosts locally in a minimal configuration as a GELF to syslog agent. It sounds odd, because at face value transitioning from a central logstash cluster supporting GELF input to local logstash instances with GELF input doesn't seem like much of an improvement. But architecturally it would benefit us significantly in that we could retire the non-kafka logstash cluster, retire the associated LVS balancers, retire the elk5 configs and stop sending udp logs across the network. I'll try prototyping a minimal logstash agent config and see what that could look like.
Change 721345 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] profile::logstash::gelf_relay: ingest GELF logs and output as JSON over UDP
Change 721345 merged by Herron:
[operations/puppet@production] profile::logstash::gelf_relay: ingest GELF logs and output as JSON over UDP
Change 721364 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] add logstash gelf relay to elastic1049
Change 721364 merged by Herron:
[operations/puppet@production] add logstash gelf relay to elastic1049
Mentioned in SAL (#wikimedia-operations) [2021-11-04T17:47:29Z] <ryankemper> T288620 [Elastic] Rebooting elastic1049.eqiad.wmnet to uptake new gelf settings change
Change 736859 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] role::elasticsearch::cirrus: ship ES logs via gelf_relay
Change 736865 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] logstash: switch monitoring API to port 9675
Change 736865 merged by Herron:
[operations/puppet@production] logstash: switch monitoring API to port 9675
Change 736872 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] logstash_exporter: add service notify to defaults file
Change 736872 merged by Herron:
[operations/puppet@production] logstash_exporter: add service notify to defaults file
Change 736859 merged by Herron:
[operations/puppet@production] role::elasticsearch::cirrus: ship ES logs via gelf_relay
Change 739324 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] role::elasticsearch::cloudelastic: ship ES logs via gelf_relay
Change 739325 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] role::elasticsearch::relforge: ship ES logs via gelf_relay
Change 740191 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] logstash::gelf::input: remove hardcoded tags
Change 739324 merged by Herron:
[operations/puppet@production] role::elasticsearch::cloudelastic: ship ES logs via gelf_relay
Change 739325 merged by Herron:
[operations/puppet@production] role::elasticsearch::relforge: ship ES logs via gelf_relay
Change 740191 merged by Herron:
[operations/puppet@production] logstash::input::gelf: remove hardcoded tags
Change 743257 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] striker: send logs to logstash pipeline via local rsyslog
Change 743257 abandoned by Herron:
[operations/puppet@production] striker: send logs to logstash pipeline via local rsyslog
Reason:
not necessary
Change 743261 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] striker: switch cloudweb dev to cee logging handler
Change 743261 merged by Herron:
[operations/puppet@production] striker: switch cloudweb dev to cee logging handler
No logs have arrived over deprecated logstash inputs in the past 4 days. Boldly resolving this!
Change 720110 abandoned by Herron:
[operations/puppet@production] wip: logagent: puppet module sketch
Reason:
with another route, see bug