Page MenuHomePhabricator

Port network checks in netops module to Prometheus/Alertmanager
Open, Needs TriagePublicGoal

Description

The netops::monitoring class deploys the network-related checks to Icinga, and is based on netops::check define. The latter is called with different parameters depending on the desired check:

  1. ipv4/ipv6 ICMP check via monitoring::host. Note this is the only paging check (critical parameter).
  2. juniper alarms check_jnx_alarms
  3. router interfaces check_ifstatus_nomon
  4. BGP status check_bgp (cfr T384731 too)
  5. VCP status check_vcp
  6. VRRP status check_vrrp
  7. BFD status check_bfd
  8. OSPF status check_ospf

For icmp checks note that we have blackbox icmp probes already deployed (and full mesh), for example probe_success{job="smoke/icmp",role="cr"} (https://w.wiki/CraL). For the rest we'll likely need the nrpe compat layer described in T350360, or ideally we already have all the information we need in prometheus/gnmic !

! MIGRATION TABLE !

Migrated? (Y/N)TitleResource TypeCommandFileProfiles
NX.mgmt BFD statusMonitoring::Servicecheck_bfdmodules/netops/manifests/check.pp:153profile::icinga
NX Juniper alarmsMonitoring::Servicecheck_jnx_alarmsmodules/netops/manifests/check.pp:109profile::icinga
NX OSPF statusMonitoring::Servicecheck_ospfmodules/netops/manifests/check.pp:164profile::icinga
Nmr1-X interfacesMonitoring::Servicecheck_ifstatus_nomonmodules/netops/manifests/check.pp:120profile::icinga
NX VCP statusMonitoring::Servicecheck_vcpmodules/netops/manifests/check.pp:131profile::icinga
NX VRRP statusMonitoring::Servicecheck_vrrpmodules/netops/manifests/check.pp:142profile::icinga

Event Timeline

Change #1155240 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] monitoring services: add migration task T384830 to instances

https://gerrit.wikimedia.org/r/1155240

Change #1155240 merged by Tiziano Fogli:

[operations/puppet@production] monitoring services: add migration task T384830 to instances

https://gerrit.wikimedia.org/r/1155240

tappof changed the subtype of this task from "Task" to "Goal".Sep 2 2025, 1:34 PM