Page MenuHomePhabricator

Add VCP stats monitoring
Closed, ResolvedPublic0 Story Points

Description

To get better visibility on issues like T228823.

On the CLI this info can be seen with:
asw2-a-eqiad> show virtual-chassis vc-port statistics extensive

On SNMP this is exposed via a dedicated MIB:
https://apps.juniper.net/mib-explorer/search.jsp#object=jnxVirtualChassisPortEntry&product=Junos%20OS&release=17.4R2

The ideal would be to implement that in LibreNMS to have graphing and alerting, on both link usage and errors.

Quick workaround would be to extend the check_vcp.py Icinga check to also look at the errors counters and alert if any is increasing.

Related Objects

Event Timeline

ayounsi triaged this task as Normal priority.Wed, Jul 24, 2:12 AM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptWed, Jul 24, 2:12 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
CDanis added a subscriber: CDanis.Wed, Jul 24, 3:07 AM

Good news, this is already implemented with: https://github.com/librenms/librenms/pull/9879

Bad news, for unknown reasons so far, the switches don't expose the proper interface data.
For example, from netmon1002:
`/usr/bin/snmpbulkwalk -v2c -c <community> -OQUs -m JUNIPER-VIRTUALCHASSIS-MIB -M /srv/deployment/librenms/librenms/mibs:/srv/deployment/librenms/librenms/mibs/junos udp:asw2-b-eqiad.mgmt.eqiad.wmnet:161 jnxVirtualChassisPortTable
`
Returns only OIDs like:
jnxVirtualChassisPortInPkts.2."vcp-255/0/48.32768" = 0
The 32768 is an internal sub-interface, with no value. There is no jnxVirtualChassisPortInPkts.2."vcp-255/0/48"
Running show snmp mib walk jnxVirtualChassisPortTable don't show clear interface names but all counters are at 0 or 1.

I tried asw2-a-eqiad and asw2-ulsfo.
trace is available on asw2-ulsfo# run file show /var/log/snmptrace.log if needed, but I couldn't find anything.
I'll follow up with JTAC...

Juniper also have a hack for old Junos, before the implementation of jnxVirtualChassisPortTable: https://kb.juniper.net/InfoCenter/index?page=content&id=KB27711 probably not something we want to do.

Service Request ID 2019-0801-0611 has been created.

Mentioned in SAL (#wikimedia-operations) [2019-08-07T23:03:26Z] <XioNoX> set virtual-chassis vcp-snmp-statistics on asw-a-codfw - T228824

Mentioned in SAL (#wikimedia-operations) [2019-08-07T23:08:29Z] <XioNoX> set virtual-chassis vcp-snmp-statistics on asw2-ulsfo - T228824

This is working!
Why is that behind a configuration options and not enabled by default? I have no idea.
Will let those two sit overnight and roll it to the whole fleet if all good.

Mentioned in SAL (#wikimedia-operations) [2019-08-08T15:49:09Z] <XioNoX> set virtual-chassis vcp-snmp-statistics to all VC - T228824

ayounsi closed this task as Resolved.Thu, Aug 8, 4:12 PM
ayounsi claimed this task.

We now have visibility on all VCPs;
https://librenms.wikimedia.org/ports/ifType=vcp/format=list_basic/
They also benefit from the same alerting as regular ports for saturation and errors.