After some internal discussions with @cmooney and @Vgutierrez, we are looking into why there were no alerts for the recent transport link saturation between magru and eqiad.
The magru and eqiad transport link (Telxius) is a 10G link and were clearly saturating it during a recent incident. (Grafana link, provided by Valentin)
In discussion with Cathal, it seems like we are alerting for transit and peering but not for the transport links themselves, as per https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-netops/interfaces.yaml#49.
expr: |
(
irate(gnmi_interfaces_interface_state_counters_out_octets{instance=~"cr.*", interface_description=~"(Transit|Peering).*"}[5m])
/
(gnmi_interfaces_interface_state_high_speed{instance=~"cr.*", interface_description=~"(Transit|Peering).*"}/8*1000000)
) > 0.9We should be expanding this alerting to the transport links as well and it should be a paging alert, like the rest of the rule. In the absence of such an alerting, we are either notified of this through purged lag alerts, or some other traffic patterns, and that may not be ideal.

