Page MenuHomePhabricator

Atlas no longer reachable from monitoring on routed ganeti
Closed, ResolvedPublic

Description

One unexpected outcome of the removal off conntracks on routed ganeti is that we were relying on them for blackbox probes of Ripe Atlas VMs. After the change we got this:

FIRING: [16x] ProbeDown: Ripe Atlas anchor atlas3001:80 is not returning HTTP 200 OK on port 80  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown

The reason is previously a conntrack was added when monitoring connected to the Atlas VM, and the response from the VM was allowed back cos it matched the conntrack. Now the SYN is allowed out, but SYN/ACK is blocked because the destination is on WMF IP space.

We'll need to decide how to handle. A specific rule allow port 80 to monitoring I guess is required.

Related Objects

StatusSubtypeAssignedTask
Resolvedcmooney

Event Timeline

cmooney triaged this task as Medium priority.
cmooney added a parent task: Restricted Task.Mar 23 2026, 5:40 PM

Change #1259935 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] routed-ganeti: allow sandbox replies to HTTP from install hosts

https://gerrit.wikimedia.org/r/1259935

Change #1259935 merged by Cathal Mooney:

[operations/puppet@production] routed-ganeti: allow sandbox replies to HTTP from install hosts

https://gerrit.wikimedia.org/r/1259935

cmooney claimed this task.

This should now be working again. Big thanks to @ayounsi for the heavy-lifting with all the puppet patches to add the $INSTALL_HOSTS set.