Yesterday during row D maintenance we ran into an issue similar to T133387 where hosts would lack ipv6 connectivity but ipv4 was working correctly. Icinga didn't alert about lack of ipv6 connectivity but it should have IMO, in case hosts have A and AAAA records.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
netops: prometheus::hosts: also probe ipv6 if available | operations/puppet | production | +25 -16 |
Related Objects
Event Timeline
i noticed i had this comment that i started typing but never saved:
currently not going to work on this and we should probably wait a bit closely watching performance metrics before we add so many extra checks
meanwhile there have also been efforts to add a lot more AAAA records to DNS and to add the mapped v6 address
Raising the priority to bring attention to this task, feel free to re-triage accordingly.
Yesterday's short outage could probably have been avoided if we had IPv6 checks on hosts.
Note for later and reworked for an alertmanager/prometheus world: we should extend netops::prometheus::hosts to also probe for ipv6, this way we'll have smoke probes also testing v6 connectivity
Change 981358 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] netops: prometheus::hosts: also probe ipv6 if available
Change 981358 merged by Majavah:
[operations/puppet@production] netops: prometheus::hosts: also probe ipv6 if available
Thank you @taavi ! The check is working as expected now, and uncovered T353254: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin !
I'm resolving, though feel free to reopen