Page MenuHomePhabricator

Unclear LVS bandwidth graph in "load balancers" dashboard
Open, MediumPublic

Description

During an incident investigation involving LVS it was pointed out that the "load balancers" dashboard shows connections/bytes/packets and icmp, though it isn't immediately obvious the former refer to LVS only whereas the latter (icmp) is for the host interface.
The task is mostly as a followup and FYI with the dashboard "owners" i.e. Traffic

I've put a text panel explaining this fact on top of the dashboard and changed the graph titles to differentiate host/lvs. https://grafana.wikimedia.org/dashboard/db/load-balancers

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptAug 29 2017, 12:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack added a subscriber: BBlack.Aug 29 2017, 5:20 PM

Are the non-icmp graphs somehow LVS-specific? My past impression of such graphs is that they aren't, and it just happens to be the case that the bulk of the LVS hosts' interface traffic is LVS-forwarded traffic. Re: icmp, our LVSes also forward/balance icmp as well...

ema triaged this task as Medium priority.Aug 30 2017, 9:20 AM
ema moved this task from Triage to LoadBalancer on the Traffic board.

Yes the are LVS-specific in the sense that the metrics backing the graphs come from /proc/net/ip_vs* and thus only for ipvs-managed services, and indeed for lvs boxes in most cases they overlap. re: ICMP are all types forwarded? For echo requests the packets didn't seem to be forwarded on to ipvs backends, I'd imagine all types that can't be associated to an existing or new connection won't be forwarded.

ema added a subscriber: ema.Aug 30 2017, 9:49 AM

Are the non-icmp graphs somehow LVS-specific?

Yes, the metrics are: node_ipvs_backend_connections_active, node_ipvs_incoming_packets_total, node_ipvs_incoming_bytes_total. The icmp graph instead plots node_netstat_Icmp_InMsgs.

The text panel @fgiunchedi added is correct, so I guess that should be enough to clarify the ambiguity? Alternatively, we could move the ICMP graphs to a new dashboard with host-specific metrics only.

Are the non-icmp graphs somehow LVS-specific?

Yes, the metrics are: node_ipvs_backend_connections_active, node_ipvs_incoming_packets_total, node_ipvs_incoming_bytes_total. The icmp graph instead plots node_netstat_Icmp_InMsgs.
The text panel @fgiunchedi added is correct, so I guess that should be enough to clarify the ambiguity? Alternatively, we could move the ICMP graphs to a new dashboard with host-specific metrics only.

Yes IMHO good enough as it is now with the legend/explanation in the text panel