While auditing codfw/eqiad traffic during switchover (T286038) I came across plaintext rsync for releases hosts, please consider switching to encrypted rsync
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | fgiunchedi | T286038 Record traffic flows in and out of eqiad during switchover | |||
| Resolved | Dzahn | T289858 Use encrypted rsync for releases | |||
| Resolved | Dzahn | T412456 SystemdUnitFailed - rsync-srv-org-wikimedia-releases-releases1003.eqiad.wmnet.service on releases1003:9100 |
Event Timeline
T289857: Use encrypted rsync for deployment::rsync has some notes on how to enable stunnel for this. However the MW-on-K8s image building process also performs an rsync against the releases host, so it might also need an update to use stunnel as well.
All files sent to releases are meant to be available to the world though. Does it still matter to encrypt traffic internally for something like this?
IMHO yes, we should encrypt traffic unless we have reasons not to (e.g. system is going to be retired, too hard/complex to implement vs advantages, etc)
I don't think this should be considered a blocker for T327920: March 2023 Datacenter Switchover
However, we should address it for mw-on-k8s and releases.
Edit: I may have misunderstood, was this unencrypted cross-datacenter traffic?
Tagging collab; since we are probably the ones who need to get back to this nowadays. Sorry for the delay; slipped off the radar.
Change #1217572 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] releases: add stunnel to rsync data copy
Change #1217572 merged by Dzahn:
[operations/puppet@production] releases: add stunnel to rsync data copy
Change #1217594 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] releases: use stunnel with rsync from deployment server
There are several different "rsyncs" involved here.
This is now resolved, using stunnel, for those data transfers from one releases server to another. 2 different ones!
Then there is another one where releases hosts pull from the deployment host. The second patch for that is up for review but not deployed yet.
Need to double check if deployment host works with this. In the scap::master there are a couple rsync::server::module but it does not use the abstraction of rsync::quickdatacopy which offers the servers_uses_stunnel parameter.
Change #1217594 merged by Dzahn:
[operations/puppet@production] releases: use stunnel with rsync from deployment server
Mentioned in SAL (#wikimedia-operations) [2025-12-15T18:10:56Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on releases1003.eqiad.wmnet with reason: T289858
Mentioned in SAL (#wikimedia-operations) [2025-12-15T18:12:21Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on releases2003.codfw.wmnet with reason: T289858
Change #1218348 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] deployment::server: allow releases hosts encrypted rsync
Change #1218348 merged by Dzahn:
[operations/puppet@production] deployment::server: allow releases hosts encrypted rsync
Mentioned in SAL (#wikimedia-operations) [2025-12-15T21:44:29Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on releases2003.codfw.wmnet with reason: T289858
Mentioned in SAL (#wikimedia-operations) [2025-12-15T21:44:52Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on releases1003.eqiad.wmnet with reason: T289858
Mentioned in SAL (#wikimedia-operations) [2025-12-15T22:23:19Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases1003.eqiad.wmnet with reason: T289858
Mentioned in SAL (#wikimedia-operations) [2025-12-15T22:23:33Z] <dzahn@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: T289858