Page MenuHomePhabricator

wiki replicas: drop NAT exceptions
Closed, ResolvedPublic

Description

We currently have NAT exceptions for wiki replicas (dbproxy1018 & dbproxy1019) but I don't think we need them, because they now use an intermediate proxy in the form of a cloud VM with a floating IP.

If this is true, files to update:

  • hieradata/eqiad/profile/openstack/eqiad1/cloudgw.yaml
  • hieradata/codfw/profile/openstack/codfw1dev/cloudgw.yaml
  • homer

Event Timeline

According to the diagram on https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Multi-instance_system_diagram I think this is true.

We need to confirm this by other technical means:

  • confirm that there are no open connections
  • confirm that nothing will break if we drop the ACLs

Confirmed there are no open connections.

The following command dumps the conntrack table on cloudgw with destination the wiki replica proxies. Looks for established connection (assured) and then filters for 172.16. There are no results, meaning no connections are established from 172.16.x.x to the wiki replica proxies.

aborrero@cloudgw1001:~ $ sudo conntrack -L --dst 208.80.154.243 | grep ASSURED | grep 172.16
conntrack v1.4.5 (conntrack-tools): 2046 flow entries have been shown.
aborrero@cloudgw1001:~ 1 $ sudo conntrack -L --dst 208.80.154.242 | grep ASSURED | grep 172.16
conntrack v1.4.5 (conntrack-tools): 1089 flow entries have been shown.

There are, however, a bunch of attempted connections from the cloud private range:

aborrero@cloudgw1001:~ $ sudo conntrack -L --dst 208.80.154.242 | grep UNREPLIED | grep 172.16
tcp      6 36 SYN_SENT src=172.16.2.185 dst=208.80.154.242 sport=41672 dport=62358 [UNREPLIED] src=208.80.154.242 dst=185.15.56.1 sport=62358 dport=41672 mark=0 use=1
tcp      6 93 SYN_SENT src=172.16.2.185 dst=208.80.154.242 sport=41672 dport=3399 [UNREPLIED] src=208.80.154.242 dst=185.15.56.1 sport=3399 dport=41672 mark=0 use=1
[..]

Those are originated on diffscan.traffic.eqiad1.wikimedia.cloud and are UNREPLIED because is not allowed by several firewall policies.

Summary: it should be safe to drop the ACLs.

Change 732628 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloud: drop NAT exceptions (dmz_cidr) for wiki-replicas

https://gerrit.wikimedia.org/r/732628

Change 732633 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/homer/public@master] cr-cloud/ drop firewall exception for wiki-replicas

https://gerrit.wikimedia.org/r/732633

Change 732628 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloud: drop NAT exceptions (dmz_cidr) for wiki-replicas

https://gerrit.wikimedia.org/r/732628

Mentioned in SAL (#wikimedia-cloud) [2021-10-21T10:12:24Z] <arturo> drop NAT exception for wiki replicas legacy setup (T293897)

Change 732633 merged by Arturo Borrero Gonzalez:

[operations/homer/public@master] cr-cloud/ drop firewall exception for wiki-replicas

https://gerrit.wikimedia.org/r/732633

Mentioned in SAL (#wikimedia-cloud) [2021-10-21T10:19:24Z] <arturo> drop firewall exception on core routers for wiki replicas legacy setup (T293897)

aborrero claimed this task.
aborrero added a subscriber: Lucas_Werkmeister_WMDE.

Done! confirmed with @Lucas_Werkmeister_WMDE via IRC that wiki replicas still work from Toolforge, so this change was a NOOP.