It seems the new alerts I added last week for OSPF status on our Nokia switches are not working as expected.
I tried to trigger a failure, however it is not working as expected, the current status for ssw1-d8-eqiad metrics is:
gnmi_nokia_ospf_oper_state{area_area_id="0.0.0.0", instance="ssw1-d8-eqiad:9804", instance_name="ospfv2", interface_interface_name="ethernet-1/11.0", job="gnmi", network_instance_name="default", prometheus="ops", site="eqiad"} 4
gnmi_nokia_ospf_neighbor_count{area_area_id="0.0.0.0", instance="ssw1-d8-eqiad:9804", instance_name="ospfv2", interface_interface_name="ethernet-1/11.0", job="gnmi", network_instance_name="default", prometheus="ops", site="eqiad"} 0The state is 4, but neighbor count is 0, so it should alert, yet I don't think it fired.
There are also AlertLintProblem firing from the prometheus instances at our POPs, where we have no Nokia switches and it thus makes sense there will be no matching series. Unsure how to best tackle that.
I'll probably need some help from observability team on these.