Page MenuHomePhabricator

ELK: uniquely identify network syslog
Closed, ResolvedPublic

Description

Since I started I've been sending network logs as local3.
Those syslog are being sent from appliances so the configuration options are very limited, they're also of various quality.

Unfortunately today other systems (cloudcontrol) started to send log to local3 as well. Making network dashboard much more difficult to use.

I was wondering if there was a way to identify network logs uniquely, eg at the ingestion point. So this issue stops and doesn't happen again.

Other option is to declare local3 as network only and prevent anyone else to use it for other purposes.

Event Timeline

A regex in logstash that detects type based on the source hostname could work. Another possibility is to correctly tag the logs as they are ingested by the syslog receiver. Neither option is great, but we lack configuration options at the source. I would opt for a regex in logstash because we have a testing framework in place.

@ayounsi if we choose this route, we'll need to develop a regex based on the network appliances naming convention.
Are there details or docs about this convention somewhere that we could reference? An overview of the convention here on task is acceptable as well.

https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Networking

Note that I don't think they send the hostname with all the logs (yeah, they're not good at it).
Would tagging it by IPs be an option? For example using something like: https://gerrit.wikimedia.org/r/c/operations/puppet/+/643703 to re-use existing IP definitions. We would need to make sure that logs come from matching IPs though.

Note that I don't think they send the hostname with all the logs (yeah, they're not good at it).

I was unable to find an example of log entries getting into Logstash without the host field set. Could you point me to where I might find or reproduce an example of this behavior?

Interestingly, I could not find logs from the following prefixes in logstash (assuming facility:local3). Is it intentional that logs from these hosts not be captured?

  • pfw
  • ps (found on centrallog)
  • psw
  • scs (found on centrallog)
  • cloudsw

Would tagging it by IPs be an option? For example using something like: https://gerrit.wikimedia.org/r/c/operations/puppet/+/643703 to re-use existing IP definitions. We would need to make sure that logs come from matching IPs though.

It might be a possibility. It's probably a more complicated setup and would need some discussion to flesh out what it would look like.

I was unable to find an example of log entries getting into Logstash without the host field set. Could you point me to where I might find or reproduce an example of this behavior?

I guess there are none then. I still feel like relying on a regex is brittle though, but if you think it's fine, then let's do it.

Interestingly, I could not find logs from the following prefixes in logstash (assuming facility:local3). Is it intentional that logs from these hosts not be captured?
pfw

Fixed

ps (found on centrallog)

Expected

psw

Not in use anymore

scs (found on centrallog)

Expected

cloudsw

Fixed

Slightly relevant, is there a way to be notified if a device haven't sent logs in a while?

I guess there are none then. I still feel like relying on a regex is brittle though, but if you think it's fine, then let's do it.

I was informed earlier today that we gave network devices a special input (udp/10514 on centrallog). This single input may make it easier to tag at ingest so we will investigate this option first.

Slightly relevant, is there a way to be notified if a device haven't sent logs in a while?

We have an exporter that runs queries against Logstash and exports metrics. We could populate a graph and alert off that reasonably easy.

Change 645181 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: identify network devices input

https://gerrit.wikimedia.org/r/645181

Change 645181 merged by Cwhite:
[operations/puppet@production] profile: identify network devices logging input

https://gerrit.wikimedia.org/r/645181

colewhite claimed this task.

Network syslog now is type => "netdev" in Logstash.