Page MenuHomePhabricator

stunnel-wrap all rsync::server usage
Open, MediumPublic

Description

Somewhat contrary to its name, rsync::quickdatcopy is used not just for server migrations or quick one-offs, but also for long-standing data replication workloads (used by everything from smokeping to phabricator). rsync::server is also used directly in many spots (analytics, eventlogging, icinga, ...).

We should TLS-wrap all rsync communications, as there's no good reason not to do so.

TODO:

  • refactor the current implementation to unconditionally enable the stunnel listener on the server side, and optionally (default true) enable vanilla rsync. This makes client migrations much easier, as there's no longer a need to synchronize changes on the client and server side. Once all the clients for a given service have been migrated, we could then disable vanilla rsync for that service.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptNov 5 2019, 3:25 PM

Change 547527 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync: add option to TLS-wrap communications

https://gerrit.wikimedia.org/r/547527

Change 547527 merged by CDanis:
[operations/puppet@production] rsync: add option to TLS-wrap communications

https://gerrit.wikimedia.org/r/547527

Change 548873 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: allow both cleartext and encrypted service

https://gerrit.wikimedia.org/r/548873

Change 548873 merged by CDanis:
[operations/puppet@production] rsync stunnel: allow both cleartext and encrypted service

https://gerrit.wikimedia.org/r/548873

Change 548877 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: enable for netmon1002 & /var/lib/smokeping

https://gerrit.wikimedia.org/r/548877

Change 548884 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: ferm support for both clear & TLS

https://gerrit.wikimedia.org/r/548884

Change 548889 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: the client needs stunnel4 installed too

https://gerrit.wikimedia.org/r/548889

Change 548884 merged by CDanis:
[operations/puppet@production] rsync stunnel: ferm support for both clear & TLS

https://gerrit.wikimedia.org/r/548884

Change 548889 merged by CDanis:
[operations/puppet@production] rsync stunnel: the client needs stunnel4 installed too

https://gerrit.wikimedia.org/r/548889

Change 548877 merged by CDanis:
[operations/puppet@production] rsync stunnel: enable for netmon1002 & /var/lib/smokeping

https://gerrit.wikimedia.org/r/548877

Change 548898 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: need to enable in /etc/defaults, apparently

https://gerrit.wikimedia.org/r/548898

Change 548898 merged by CDanis:
[operations/puppet@production] rsync stunnel: need to enable in /etc/defaults, apparently

https://gerrit.wikimedia.org/r/548898

Change 548908 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync stunnel: always accept from localhost

https://gerrit.wikimedia.org/r/548908

Change 548908 merged by CDanis:
[operations/puppet@production] rsync stunnel: always accept from localhost

https://gerrit.wikimedia.org/r/548908

After fixing far too many things that were subtly wrong, it works!

1โœ”๏ธ cdanis@netmon2001.wikimedia.org ~ ๐Ÿ•”๐Ÿบ sudo /usr/local/sbin/sync-var-lib-smokeping
22019.11.05 22:12:16 LOG5[ui]: stunnel 5.39 on x86_64-pc-linux-gnu platform
32019.11.05 22:12:16 LOG5[ui]: Compiled with OpenSSL 1.1.0c 10 Nov 2016
42019.11.05 22:12:16 LOG5[ui]: Running with OpenSSL 1.1.0l 10 Sep 2019
52019.11.05 22:12:16 LOG5[ui]: Update OpenSSL shared libraries or rebuild stunnel
62019.11.05 22:12:16 LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6,SYSTEMD TLS:ENGINE,FIPS,OCSP,PSK,SNI Auth:LIBWRAP
72019.11.05 22:12:16 LOG5[ui]: Reading configuration from file /tmp/sync-ssl-wrapper.stunnel.conf.6NZf661Q
82019.11.05 22:12:16 LOG5[ui]: UTF-8 byte order mark not detected
92019.11.05 22:12:16 LOG5[ui]: FIPS mode disabled
102019.11.05 22:12:16 LOG4[ui]: Service [stunnel] uses "verify = 2" without subject checks
112019.11.05 22:12:16 LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
122019.11.05 22:12:16 LOG5[ui]: Configuration successful
132019.11.05 22:12:16 LOG5[0]: Service [stunnel] accepted connection from unnamed socket
142019.11.05 22:12:16 LOG5[0]: s_connect: connected 208.80.154.5:1873
152019.11.05 22:12:16 LOG5[0]: Service [stunnel] connected remote server from 208.80.153.110:50364
162019.11.05 22:12:16 LOG5[0]: Certificate accepted at depth=0: CN=netmon1002.wikimedia.org
172019.11.05 22:12:17 LOG5[0]: Connection closed: 486490 byte(s) sent to TLS, 1472301 byte(s) sent to socket

1Nov 05 22:11:59 netmon1002 systemd[1]: Stopped LSB: Start or stop stunnel 4.x (TLS tunnel for network daemons).
2Nov 05 22:11:59 netmon1002 systemd[1]: Starting LSB: Start or stop stunnel 4.x (TLS tunnel for network daemons)...
3Nov 05 22:11:59 netmon1002 stunnel4[17063]: Starting TLS tunnels: /etc/stunnel/rsync.conf: started (no pid=pidfile specified!)
4Nov 05 22:11:59 netmon1002 systemd[1]: Started LSB: Start or stop stunnel 4.x (TLS tunnel for network daemons).
5Nov 05 22:12:16 netmon1002 stunnel[17078]: LOG5[0]: Service [rsync] accepted connection from 208.80.153.110:50364
6Nov 05 22:12:16 netmon1002 stunnel[17078]: LOG5[0]: Certificate accepted at depth=0: CN=netmon2001.wikimedia.org
7Nov 05 22:12:16 netmon1002 rsyncd[18399]: connect from localhost (127.0.0.1)
8Nov 05 22:12:16 netmon1002 rsyncd[18399]: rsync on var-lib-smokeping/ from localhost (127.0.0.1)
9Nov 05 22:12:16 netmon1002 rsyncd[18399]: building file list
10Nov 05 22:12:17 netmon1002 stunnel[17078]: LOG5[0]: Connection closed: 1472301 byte(s) sent to TLS, 486490 byte(s) sent to socket

Followup work: in at least the case of at least rsync::quickdatacopy, we should write some checkHost lines into the generated stunnel configs, to allow certificates from only the allowed hosts.

There's also some added logspam to be fixed:

Nov 05 22:30:01 netmon2001 CRON[6788]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 05 22:30:01 netmon2001 CRON[6790]: (root) CMD (/usr/local/sbin/sync-var-lib-smokeping >/dev/null 2>&1)
Nov 05 22:30:01 netmon2001 CRON[6789]: pam_unix(cron:session): session opened for user prometheus by (uid=0)
Nov 05 22:30:01 netmon2001 CRON[6792]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: stunnel 5.39 on x86_64-pc-linux-gnu platform
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Compiled with OpenSSL 1.1.0c  10 Nov 2016
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Running  with OpenSSL 1.1.0l  10 Sep 2019
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Update OpenSSL shared libraries or rebuild stunnel
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6,SYSTEMD TLS:ENGINE,FIPS,OCSP,PSK,SNI Auth:LIBWRAP
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Reading configuration from file /tmp/sync-ssl-wrapper.stunnel.conf.j3tqw7M1
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: UTF-8 byte order mark not detected
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: FIPS mode disabled
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG4[ui]: Service [stunnel] uses "verify = 2" without subject checks
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[ui]: Configuration successful
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[0]: Service [stunnel] accepted connection from unnamed socket
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG3[0]: s_connect: connect 2620:0:861:1:208:80:154:5:1873: Connection refused (111)
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[0]: s_connect: connected 208.80.154.5:1873
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[0]: Service [stunnel] connected remote server from 208.80.153.110:50384
Nov 05 22:30:01 netmon2001 stunnel[6798]: LOG5[0]: Certificate accepted at depth=0: CN=netmon1002.wikimedia.org
Nov 05 22:30:01 netmon2001 CRON[6789]: pam_unix(cron:session): session closed for user prometheus
Nov 05 22:30:02 netmon2001 stunnel[6798]: LOG5[0]: Connection closed: 486494 byte(s) sent to TLS, 1140157 byte(s) sent to socket
Nov 05 22:30:02 netmon2001 CRON[6788]: pam_unix(cron:session): session closed for user root

Another thing that just came up: not all users of rsync::server::module are actually passing an array to the $hosts_allow argument: https://gerrit.wikimedia.org/r/c/operations/puppet/+/549142
Need to go through PuppetDB and look for this.

Change 549164 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync::server::module: type annotations for hosts_allow/deny

https://gerrit.wikimedia.org/r/549164

Change 549164 merged by CDanis:
[operations/puppet@production] rsync::server::module: ensure hosts_allow is an array

https://gerrit.wikimedia.org/r/549164

Generated a list of all $hosts_allow arguments from rsync::server::module invocations across all of Puppet: P9544

They're mostly arrays of FQDNs, which is nice.

Change 549949 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync: provide a hiera default to unbreak cloud

https://gerrit.wikimedia.org/r/549949

Change 549949 merged by CDanis:
[operations/puppet@production] rsync: provide a hiera default to unbreak cloud

https://gerrit.wikimedia.org/r/549949

Change 551667 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] rsync: fix multiple usages of quickdatacopy

https://gerrit.wikimedia.org/r/551667

Change 551667 merged by CDanis:
[operations/puppet@production] rsync: fix multiple usages of quickdatacopy

https://gerrit.wikimedia.org/r/551667

Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).

Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.

If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Actionโ€ฆ ๐Ÿก’ Change Status".

Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.

jbond subscribed.

@CDanis is there anything outstanding still to do for this task?

@CDanis is there anything outstanding still to do for this task?

I think so, yeah. The overall goal of "make all rsyncs happen via stunnel" is still a good one, and AFAICT still incomplete.

I think some of the details of how this would be accomplished have changed as of https://gerrit.wikimedia.org/r/c/operations/puppet/+/888065 which I hadn't seen go by, but the overall plan of "enable dual cleartext + stunnel rsync servers by default, then migrate clients, then shut off cleartext servers" seems like a fine one.

I think so, yeah. The overall goal of "make all rsyncs happen via stunnel" is still a good one, and AFAICT still incomplete.

Ack sgtm will leave it open