Page MenuHomePhabricator

Investigate internal rejected prefixes
Closed, ResolvedPublic

Description

Thanks to T356877: Increase visibility of kubernetes network status I realized that we probably should keep an eye on the prefixes that we reject internally.

On a case by case basis we should then edit the outbound filters, so we never need to reject a prefix on the inbound side.

That way we could have a generic Netops alert to trigger when we start rejecting prefixes.
If that alert triggers too much because of services on the various hosts, we could then offload that alert to service owners.

For example with https://grafana.wikimedia.org/goto/dfj5c1kceij9cc?orgId=1
gnmi_bgp_neighbor_prefixes_rejected{peer_group!~"(IX|Private-Peer|Transit)[4|6]"} > 0
or
gnmi_bgp_neighbor_prefixes_received_pre_policy{peer_group!~"(IX|Private-Peer|Transit)[4|6]"} - gnmi_bgp_neighbor_prefixes_received{peer_group!~"(IX|Private-Peer|Transit)[4|6]"} > 0

Event Timeline

ayounsi triaged this task as Low priority.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1294931 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Nokia: also alow anycast prefixes from Ganeti peers

https://gerrit.wikimedia.org/r/1294931

{
address="10.128.0.20",
afi_safi="IPV4_UNICAST",
instance="asw1-22-ulsfo:9804",
job="gnmi",
network_instance_name="default",
peer_as="64612",
peer_descr="ganeti4005",
peer_group="ganeti4",
peer_type="EXTERNAL",
prometheus="ops",
protocol_identifier="BGP",
protocol_name="BGP",
site="ulsfo"
}

Confirmed on the switches:

asw1-22-ulsfo# info from state / network-instance default bgp-rib afi-safi ipv4-unicast ipv4-unicast rib-in-out rib-in-post route 10.3.0.10/32 neighbor 10.128.0.20 path-id 0
[...]
rejected-route true

Will be fixed with: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1294931

Mentioned in SAL (#wikimedia-operations) [2026-05-28T08:50:09Z] <XioNoX> cr1-codfw# delete protocols bgp group fundraising family inet6 - T423384

Another one was:

pfw1-codfw> show route receive-protocol bgp 208.80.153.202 hidden extensive    

inet.0: 26 destinations, 29 routes (26 active, 0 holddown, 0 hidden)
Restart Complete

inet6.0: 1 destinations, 1 routes (0 active, 0 holddown, 1 hidden)
Restart Complete
  ::/0 (1 entry, 0 announced)
     Nexthop: ::ffff:208.80.153.202
     MED: 100
     AS path: 14907 I 
     Hidden reason: Protocol nexthop is not on the interface

Cleared with pfw1-codfw# delete protocols bgp group Production family inet6 as the pfw doesn't have IPv6. Done the same on the cr side to keep it clean.

Another one, pfw1 re-advertises its uplinks subnets to cr1/2-codfw:

cr2-codfw# run show route receive-protocol bgp 208.80.153.203 hidden extensive 

inet.0: 1042192 destinations, 3688335 routes (1041891 active, 0 holddown, 912 hidden)
Restart Complete
  208.80.153.200/31 (2 entries, 1 announced)
     Nexthop: 208.80.153.203
     MED: 100
     AS path: 64701 I 
     Hidden reason: Rejected by import policy

  208.80.153.202/31 (2 entries, 1 announced)
     Nexthop: 208.80.153.203
     MED: 100
     AS path: 64701 I 
     Hidden reason: Rejected by import policy

I'm suggesting we add this term on the pfw side:

[edit policy-options policy-statement BGP_fundraising_export]
     term no_management { ... }
+    term no_uplinks {
+        from interface [ xe-0/2/0.0 xe-7/2/0.0 ];
+        then reject;
+    }
     term connected { ... }

Full policy:

term no_management {
    from interface fxp0.0;
    then reject;
}
term no_uplinks {
    from interface [ xe-0/2/0.0 xe-7/2/0.0 ];
    then reject;
}
term connected {
    from protocol direct;
    then accept;
}
term nat {
    from {
        protocol aggregate;
        prefix-list fundraising-codfw-external4;
    }
    then accept;
}
then reject;

An alternative to not have to harcode interfaces names would be to reject directly connected /31s, which would also block the internal IPsec tunnel range, but I don't think it's an issue.

+    term no_p2p {
+        from {
+            protocol direct;
+            route-filter 0.0.0.0/0 prefix-length-range /31-/31;
+        }
+        then reject;
+    }

Change #1294931 merged by jenkins-bot:

[operations/homer/public@master] Nokia: also allow anycast prefixes from Ganeti peers

https://gerrit.wikimedia.org/r/1294931

Change #1295011 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Nokia: add missing Wikidough prefix

https://gerrit.wikimedia.org/r/1295011

Change #1295011 merged by jenkins-bot:

[operations/homer/public@master] Nokia: add missing Wikidough prefix

https://gerrit.wikimedia.org/r/1295011

Mentioned in SAL (#wikimedia-operations) [2026-06-01T07:56:13Z] <XioNoX> add no_p2p term to pfw1-codfw BGP_fundraising_export - T423384

Change #1295805 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/alerts@master] Add RejectingBGPPrefixes alert

https://gerrit.wikimedia.org/r/1295805

Mentioned in SAL (#wikimedia-operations) [2026-06-02T07:32:37Z] <XioNoX> pfw1-eqiad# delete protocols bgp group Production family inet6 - T423384

Change #1295805 merged by jenkins-bot:

[operations/alerts@master] Add RejectingBGPPrefixes alert

https://gerrit.wikimedia.org/r/1295805

ayounsi claimed this task.

Closing that task, keeping the subtask as follow up.