Page MenuHomePhabricator

ferm: Log dropped packets
Closed, ResolvedPublic

Description

I've already used manually appended iptables rules to log dropped packets during debugging/setting up new services.

We should do that in general as part of the ferm base rules to ease debugging and trouble shooting. On most systems nothing is logged (except the occasional PXE/TFTP broadcasts from installations).

Details

Related Gerrit Patches:

Event Timeline

MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff updated the task description. (Show Details)
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptOct 20 2015, 2:38 PM
MoritzMuehlenhoff triaged this task as Normal priority.Oct 20 2015, 2:39 PM
MoritzMuehlenhoff set Security to None.
faidon added a subscriber: faidon.Oct 20 2015, 2:40 PM

Yeah that's a good idea. If we do, should we consider userspace logging with ulogd instead of spamming dmesg? That way we could potentially collect it in the future as well and e.g. detect anomalies.

Yes, my initial debug rules simply used syslog, but for a more complete fleet-wide solution I was thinking of ulogd(2).

Change 486513 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Create module for managing ulogd

https://gerrit.wikimedia.org/r/486513

jbond added a comment.Feb 5 2019, 1:20 PM

I have created a simple module for configuereing ulogd2 avalible in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/486513/. The questions is what do we want to log and how do we want to log it. With the default configuration the module will produce the following:

syslogemu.log:
This is a syslog style log file with entries for each packet logged with NGLOG, sample output is below

Feb  5 13:03:43 jbond-puppet-client  IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:fa:16:3e:75:d9:63:08:00 SRC=172.16.6.122 DST=255.255.255.255 LEN=390 TOS=00 PREC=0xC0 TTL=64 ID=61737 PROTO=UDP SPT=67 DPT=68 LEN=370 UID=0 GID=0 MARK=0

ulogd.json:
This is a json log file with entries for each dropped packet sample output is below. This could be useful for putting into logstash/kibana/grafana etc

{"timestamp": "2019-02-05T13:03:43.569314", "dvc": "Netfilter", "raw.pktlen": 390, "raw.pktcount": 1, "oob.prefix": "", "oob.time.sec": 1549371823, "oob.time.usec": 569314, "oob.mark": 0, "oob.ifindex_in": 2, "oob.hook": 1, "raw.mac_len": 14, "oob.family": 2, "oob.protocol": 2048, "oob.uid": 0, "oob.gid": 0, "raw.label": 0, "raw.type": 1, "raw.mac.addrlen": 6, "ip.protocol": 17, "ip.tos": 192, "ip.ttl": 64, "ip.totlen": 390, "ip.ihl": 5, "ip.csum": 54515, "ip.id": 61737, "ip.fragoff": 0, "src_port": 67, "dest_port": 68, "udp.len": 370, "udp.csum": 32564, "oob.in": "eth0", "oob.out": "", "src_ip": "172.16.6.122", "dest_ip": "255.255.255.255", "mac.saddr.str": "fa:16:3e:75:d9:63", "mac.daddr.str": "ff:ff:ff:ff:ff:ff", "mac.str": "ff:ff:ff:ff:ff:ff:fa:16:3e:75:d9:63:08:00"}

syslogemu_nfct.log
Flow based logging in syslog format, configured to only log destroy events

Feb  5 12:55:11 jbond-puppet-client [DESTROY] ORIG: SRC=172.16.1.14 DST=169.254.169.254 PROTO=TCP SPT=40664 DPT=80 PKTS=0 BYTES=0 , REPLY: SRC=169.254.169.254 DST=172.16.1.14 PROTO=TCP SPT=80 DPT=40664 PKTS=0 BYTES

ulog_nfct.json
Flow based logging in json format, configured to only log all events

{"timestamp": "2019-02-05T13:08:42.321313", "dvc": "Netfilter", "raw.pktlen": 390, "raw.pktcount": 1, "oob.prefix": "", "oob.time.sec": 1549372122, "oob.time.usec": 321313, "oob.mark": 0, "oob.ifindex_in": 2, "oob.hook": 1, "raw.mac_len": 14, "oob.family": 2, "oob.protocol": 2048, "oob.uid": 0, "oob.gid": 0, "raw.label": 0, "raw.type": 1, "raw.mac.addrlen": 6, "ip.protocol": 17, "ip.tos": 192, "ip.ttl": 64, "ip.totlen": 390, "ip.ihl": 5, "ip.csum": 49622, "ip.id": 2467, "ip.fragoff": 0, "src_port": 67, "dest_port": 68, "udp.len": 370, "udp.csum": 3881, "oob.in": "eth0", "oob.out": "", "src_ip": "172.16.1.30", "dest_ip": "255.255.255.255", "mac.saddr.str": "fa:16:3e:20:88:14", "mac.daddr.str": "ff:ff:ff:ff:ff:ff", "mac.str": "ff:ff:ff:ff:ff:ff:fa:16:3e:20:88:14:08:00"}

The tool also supports other output plugins including GRAPHITE which would allow sending events directly to a garphite cluster however i thought i heard this may be getting phased out. If we want to do accounting based logging there are also a few more filters we can use to only capture whats useful

 ulogd -v -i  /usr/lib/x86_64-linux-gnu/ulogd/ulogd_inpflow_NFCT.so
Name: NFCT
Config options:
        Var: pollinterval (Integer, Default: 0)
        Var: hash_enable (Integer, Default: 1)
        Var: hash_buckets (Integer, Default: 8192)
        Var: hash_max_entries (Integer, Default: 32768)
        Var: event_mask (Integer, Default: 5)
        Var: netlink_socket_buffer_size (Integer, Default: 0)
        Var: netlink_socket_buffer_maxsize (Integer, Default: 0)
        Var: netlink_resync_timeout (Integer, Default: 60)
        Var: reliable (Integer, Default: 0)
        Var: accept_src_filter (String, Default: )
        Var: accept_dst_filter (String, Default: )
        Var: accept_proto_filter (String, Default: )

I have created a simple module for configuereing ulogd2 avalible in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/486513/. The questions is what do we want to log and how do we want to log it. With the default configuration the module will produce the following:

In the first step we should log dropped packets, that's useful to e.g. debug problems or when we finetune a new service. But in some cases it's certainly also useful to be able to switch to the complete log, e.g. when we work on making an existing ferm rule more restrictive. Maybe we can make the mode configurable via Hiera?

On the matter of where to log, I'd prefer Logstash, but @herron or @fgiunchedi are in a better position to comment how to best integrate this.

The tool also supports other output plugins including GRAPHITE which would allow sending events directly to a garphite cluster however i thought i heard this may be getting phased out. If we want to do accounting based logging there are also a few more filters we can use to only capture whats useful

Graphite is in fact deprecated, see e.g. https://phabricator.wikimedia.org/T205862

I have created a simple module for configuereing ulogd2 avalible in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/486513/. The questions is what do we want to log and how do we want to log it. With the default configuration the module will produce the following:

In the first step we should log dropped packets, that's useful to e.g. debug problems or when we finetune a new service. But in some cases it's certainly also useful to be able to switch to the complete log, e.g. when we work on making an existing ferm rule more restrictive. Maybe we can make the mode configurable via Hiera?
On the matter of where to log, I'd prefer Logstash, but @herron or @fgiunchedi are in a better position to comment how to best integrate this.

Do we have a sense of volume of logs? My suggestion would be to start with syslog so logs end up in central syslog hosts (json-formatted even, with @cee cookie), with that we can get a better idea of volume. Once logs are in syslog then it is easy to opt-in ulogd to send to logstash/kibana too.

jbond added a comment.Feb 6 2019, 10:58 AM

Do we have a sense of volume of logs?

ulog has been running on jbond-puppet-client.puppet.eqiad.wmflabs since 2019-02-05 12:58:43, so roughly 20 hours.

-rw-r--r-- 1 root root 116K Feb 6 10:48 syslogemu.log
-rw-r--r-- 1 root root 402K Feb 6 10:48 ulogd.json

As moritz suggested this is all PXE/TFTP broadcasts which we could potentially filter out

My suggestion would be to start with syslog so logs end up in central syslog hosts (json-formatted even, with @cee cookie), with that we can get a better idea of volume. Once logs are in syslog then it is easy to opt-in ulogd to send to logstash/kibana too.

With ulog its possible to log to multiple places. however the json output produced above can only be sent to a file. If we where to log to syslog and rely on that for delivery into logstash we would have to preform the json tokenization our-self , there is also a lot more information in the json output which we would loose (although it may not be useful)

herron added a comment.Feb 6 2019, 4:46 PM

While not necessarily optimal, it is possible to ingest a file with rsyslog. So, if left with no other option we may be able to ingest json this way for forwarding on. At the same time I think having a "human readable" and greppable log file on the host would be useful for quick troubleshooting. Maybe we could settle on more that one output? For example a plain syslog output to populate a greppable local /var/log/firewall.log (or similar) and the central log hosts, and a json file for structured logs that is passed onwards to logstash and friends.

By the way, would the ferm/iptables rule be deployed with a --limit or similar? I think it would be worthwhile to put a ceiling on the amount of log volume that dropped packets may generate per-host.

jbond added a comment.Feb 8 2019, 11:18 AM

discussion from meeting https://etherpad.wikimedia.org/p/SRE-Foundations-ulogd_discussion

The conclusion was to have iptable drop standard entries sent to syslog, as syslog is already in the logging pipeline. for now metrics will be put on pause

Change 490836 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Enable ulog module.

https://gerrit.wikimedia.org/r/490836

Change 486513 merged by Jbond:
[operations/puppet@production] Create module for managing ulogd

https://gerrit.wikimedia.org/r/486513

Change 490836 merged by Jbond:
[operations/puppet@production] ulogd puppet profile.

https://gerrit.wikimedia.org/r/490836

Change 496415 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Add logging to cumin nodes

https://gerrit.wikimedia.org/r/496415

Change 496415 merged by Jbond:
[operations/puppet@production] Add logging to cumin nodes

https://gerrit.wikimedia.org/r/496415

Looking at cumin1001 I noticed that the log prefix at the end of the input chan is "fw-out-drop" and the output chain is empty with an accept policy. Is "out" indeed the direction in this case? Or would dropped packets logged by the input chain be considered "in"?

Change 506634 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] ulogd: rename nflog comment

https://gerrit.wikimedia.org/r/506634

Looking at cumin1001 I noticed that the log prefix at the end of the input chan is "fw-out-drop" and the output chain is empty with an accept policy. Is "out" indeed the direction in this case? Or would dropped packets logged by the input chain be considered "in"?

Yes i think you are right i have just sent a patch to update this

Change 506634 merged by Jbond:
[operations/puppet@production] ulogd: rename nflog comment

https://gerrit.wikimedia.org/r/506634

Change 511700 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: enable loggin on internal servers

https://gerrit.wikimedia.org/r/511700

Change 511701 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall loggin: enable firewall logging on wmcs servers

https://gerrit.wikimedia.org/r/511701

Change 511703 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: enable firewall logging on external servers

https://gerrit.wikimedia.org/r/511703

Change 511704 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: Enable logging on external servers

https://gerrit.wikimedia.org/r/511704

Change 511705 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: add firewall logging to kafak servers

https://gerrit.wikimedia.org/r/511705

Change 511706 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: Enable logging on misc services

https://gerrit.wikimedia.org/r/511706

Change 511707 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: enable logging on ores

https://gerrit.wikimedia.org/r/511707

Change 511708 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: Enable firewall logging on mediawiki

https://gerrit.wikimedia.org/r/511708

Change 511709 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: enable firewall logging on remaining roles

https://gerrit.wikimedia.org/r/511709

Change 511700 merged by Jbond:
[operations/puppet@production] firewall logging: enable loggin on internal servers

https://gerrit.wikimedia.org/r/511700

Change 511701 merged by Jbond:
[operations/puppet@production] firewall loggin: enable firewall logging on wmcs servers

https://gerrit.wikimedia.org/r/511701

Change 511705 merged by Jbond:
[operations/puppet@production] firewall logging: add firewall logging to kafak servers

https://gerrit.wikimedia.org/r/511705

Change 511706 merged by Jbond:
[operations/puppet@production] firewall logging: Enable logging on misc services

https://gerrit.wikimedia.org/r/511706

Change 511709 merged by Jbond:
[operations/puppet@production] firewall logging: enable firewall logging on remaining roles

https://gerrit.wikimedia.org/r/511709

Change 511707 merged by Jbond:
[operations/puppet@production] firewall logging: enable logging on ores

https://gerrit.wikimedia.org/r/511707

Change 511708 merged by Jbond:
[operations/puppet@production] firewall logging: Enable firewall logging on mediawiki

https://gerrit.wikimedia.org/r/511708

Change 511703 merged by Jbond:
[operations/puppet@production] firewall logging: enable firewall logging on external servers

https://gerrit.wikimedia.org/r/511703

Change 511704 merged by Jbond:
[operations/puppet@production] firewall logging: Enable logging on external servers

https://gerrit.wikimedia.org/r/511704

Change 517832 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: enable ulog by default

https://gerrit.wikimedia.org/r/517832

Change 517833 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: clean up old roll-out classes

https://gerrit.wikimedia.org/r/517833

Change 517834 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: clean up old role out class

https://gerrit.wikimedia.org/r/517834

Change 517835 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] firewall logging: make profile::base::firewall::log a private class

https://gerrit.wikimedia.org/r/517835

Change 517832 merged by Jbond:
[operations/puppet@production] firewall logging: enable ulog by default

https://gerrit.wikimedia.org/r/517832

Change 517833 merged by Jbond:
[operations/puppet@production] firewall logging: clean up old roll-out classes

https://gerrit.wikimedia.org/r/517833

Change 517834 merged by Jbond:
[operations/puppet@production] firewall logging: clean up old roll-out class

https://gerrit.wikimedia.org/r/517834

Change 517835 merged by Jbond:
[operations/puppet@production] firewall logging: make profile::base::firewall::log a private class

https://gerrit.wikimedia.org/r/517835

jbond added a comment.Jun 19 2019, 1:00 PM

Closing as logging is now enabled by default for any role with the profile::base::firewall class. Please reopen if more work is required

jbond closed this task as Resolved.Jun 19 2019, 2:32 PM