Setup flow monitoring of internal network traffic
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	mark
	Aug 13 2011, 10:19 AM

Description

The TSO/GRO problems we had the last few weeks could have been spotted much
earlier if we had sflow/netflow monitoring of our *internal* network for things
like unusually high amount of ICMP errors, TCP retransmits, etc.
--
Mark Bergsma <mark at wikimedia>
Operations Engineering Program Manager
Wikimedia Foundation

Details

Reference: rt1308

Related Objects
Search...

		Status	Subtype	Assigned	Task
					Restricted Task
		Resolved		ayounsi	T79755 Setup flow monitoring of internal network traffic

Event Timeline

• rtimport raised the priority of this task from to Medium.Dec 18 2014, 12:55 AM

• rtimport added a project: netops.

• rtimport set Reference to rt1308.

mark created this task.Aug 13 2011, 10:19 AM

Issue taken by lcarr

Dependency by ticket #6775 added by gage

• Gage added a project: observability.Dec 18 2014, 6:40 PM

• Gage set Security to None.

fgiunchedi changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".Dec 2 2015, 3:19 PM

fgiunchedi changed the edit policy from "WMF-NDA (Project)" to "All Users".

Prometheus (that didn't exist in 2011) with netstat provides better visibility on problematic frames/segments/datagrams/packets getting in/out of the servers.
I created two dashboards (still as draft):
https://grafana.wikimedia.org/dashboard/db/network-performances-global
and
https://grafana.wikimedia.org/dashboard/db/network-performances

After investigating the out of the ordinary patterns, we will be able to add alerting on those graphs to be notified when something needs our attention.

Nemo_bis subscribed.Jun 12 2017, 3:54 PM

Alerts added to the dashboard (not tied to nagios, but shows up in the "single pane of glass" dashboard in LibreNMS.
I think that satisfies the initial request.
More graphs/alerts will be added when needed.

Setup flow monitoring of *internal* network trafficClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Setup flow monitoring of internal network traffic
Closed, ResolvedPublic
Actions

Related Objects
Search...