Page MenuHomePhabricator

[dumps] Configure rsyncd authentication - or a suitable alternative
Closed, ResolvedPublic

Description

Currently, the seven dumpsdata and clouddumps hosts involved in dumps publication have an rsync configuration fragment that allows all of these hosts to sync the contents of /srv/dumps to and from each other.

They use a hosts.allow configuration option in /etc/rsyncd.d/10-rsync-datasets_to_peers.conf and do not use authentication.

This will not be sufficient when our sync sources are pods running on the dse-k8s-eqiad cluster, because we would need to allowlist the whole of the $DSE_KUBEPODS network.

So we will need to find a solution to this.

We could:

  • Use rsyncd password authentication
  • Use rsync over ssh and use public key authentication
  • User another synchronization mechanism, other than rsync
  • Use a pull-based syncronisation mechanism, by allowing clouddumps100[1-2] to mount the cephfs volume
  • Or something else

Event Timeline

BTullis triaged this task as High priority.

I think that the best course of action here is probably to use SSH based public key authentication.

We can do certain things to harden this configuration against misuse:

  • We would need to add the $DSE_KUBEPODS network range allowed to access TCP port 22.
  • Limit access from this range to the dumpsgen user.
  • Exclude interactive login for the dumpsgen user and only permit the sftp-internal command.
  • Chroot the user to the destination directory.

We would need to configure a new Match fragment of the sshd_config in order for this to work, so I will check whether that is easy to do with our puppet setup.

The SSH private key will need to be stored in the puppet private repository, but then it will need to be realised on the deployment servers in order that it can be deployed to Kubernetes as a Secret in the mediawiki-dumps-legacy namespace.
From there, any pods that carry out the sync tasks will be able to access the secret.

So this means that members of the deployment group would be able to access the private key, as well as anyone in the airflow-test-k8s.

This access list differs from the list of users who could currently log into the dumpsdata* and clouddumps* hosts, which is:

  • dumps-roots
  • dumpsdata-admins
  • analytics-admins
  • wmcs-roots

These are people who could currently gain access to the rsyncd service for /srv/dumpson clouddumps100[1-2].
So broadening the access to the deployers group is a change, but not particularly drastic.

It looks like our profile::ssh::server has provision for adding custom Match blocks to a server's /etc/ssh/sshd_config as long as they conform to the allowed types of configuration.

I'll contact Infrastructure-Foundations to check that they would be happy with this setup, from a security perspective?

BTullis renamed this task from Configure rsyncd authentication - or a suitable alternative to [dumps] Configure rsyncd authentication - or a suitable alternative.Apr 3 2025, 12:30 PM

I spoke to @MoritzMuehlenhoff who said that this was feasible and reasonable, but suggested that I might also like to look at the existing support for rsyncd password authentication within puppet.
We are already using this with the profile::kerberos::kadminserver module.

I also notice that we have an rsync::server::stunnel class that could be used to wrap around rsync, providing an encryption capability.
I'll do a little more research.

Change #1135416 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add an rsync fragment to permit dse-k8s pods to sync mediawiki dumps

https://gerrit.wikimedia.org/r/1135416

Change #1135425 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Add a dummy password for rsyncing mediawiki-dumps-legacy

https://gerrit.wikimedia.org/r/1135425

Change #1135425 merged by Btullis:

[labs/private@master] Add a dummy password for rsyncing mediawiki-dumps-legacy

https://gerrit.wikimedia.org/r/1135425

Change #1135442 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Rename mediawiki-dumps-legacy rsync password

https://gerrit.wikimedia.org/r/1135442

Change #1135442 merged by Btullis:

[labs/private@master] Rename mediawiki-dumps-legacy rsync password

https://gerrit.wikimedia.org/r/1135442

Change #1135416 merged by Btullis:

[operations/puppet@production] Add an rsync fragment to permit dse-k8s pods to sync mediawiki dumps

https://gerrit.wikimedia.org/r/1135416

I went with the option to use password authenticated rsync and an rsync::server::module resource in https://gerrit.wikimedia.org/r/1135416

However, I had to revert the change because it introduced some other changes that would likely have adversely affected the dumps mirror sites.

Specifically, the rsync::server module recursively removes the /etc/rsync.d/ directory.

I can see two options:

  1. Update the rsync configuration on all dumpsdata and clouddumps hosts to use the new rsync::server::module configuration.
  2. Switch to using rsync over ssh, so that I don't have to touch the rsyncd configuration.

Of these, I think that option two is probably the simpler option, so I'll have alook at that next.

Change #1139835 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Enable the dumpsgen user to use an rsync server over ssh from dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1139835

Change #1139840 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Create an SSH private key in the mediawiki-dumps-legacy namespace

https://gerrit.wikimedia.org/r/1139840

I have got a configuration in https://gerrit.wikimedia.org/r/1139835 that I believe will work for this.

It will create a file /etc/ssh/userkeys/dumpsgen with the content of:

from=10.67.24.0/21,from=2620:0:861:302::/64,command="/usr/bin/rsync --server -vlogDtprz" . /srv/mediawiki-dumps-legacy ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINP7YBWrx/crCj9oOq/rBCmT//PXufLEUU2W6iEud5wL dumpsgen-rsync@dse-k8s-eqiad

This format of authorized_key is like using the ForceCommand option of the sshd_config file. It will permit only rsync with the specific parameters, from the specific hosts.
It will only be called when the private key matching the public part is presented and the file name means that it will only be allowed for the dumpsgen user.

I did think about using the profile::ssh::server::config_match options to force a ChrootDirectory to /srv/mediawiki-dumps-legacy but then I decided that it would be fine just to use the normal root and a full path in the rsync command, for additional clarity.

I have also created https://gerrit.wikimedia.org/r/1139840 which renders the associated private key as a secret in the mediwki-dumps-legacy namespace, as well as adding the real private key to the private repo.

Change #1139835 merged by Btullis:

[operations/puppet@production] Enable the dumpsgen user to use an rsync server over ssh from dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1139835

Change #1139840 merged by jenkins-bot:

[operations/deployment-charts@master] Create an SSH private key in the mediawiki-dumps-legacy namespace

https://gerrit.wikimedia.org/r/1139840

Change #1139903 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] mediawiki-dumps-legacy: Fix helmfile secrets path

https://gerrit.wikimedia.org/r/1139903

Change #1139903 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-dumps-legacy: Fix helmfile secrets path

https://gerrit.wikimedia.org/r/1139903

Change #1139906 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] mediawiki-dumps-legacy: Add private values files to resources deployment

https://gerrit.wikimedia.org/r/1139906

Change #1139906 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-dumps-legacy: Add private values files to resources deployment

https://gerrit.wikimedia.org/r/1139906

Change #1140178 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] mediawiki-dumps-legacy: Rename the ssh private key secret

https://gerrit.wikimedia.org/r/1140178

Change #1140178 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-dumps-legacy: Rename the ssh private key secret

https://gerrit.wikimedia.org/r/1140178

Change #1140481 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] mediawiki-dumps-legacy: Add a networkpolicy to allow publishing dumps

https://gerrit.wikimedia.org/r/1140481

Change #1140481 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-dumps-legacy: Add a networkpolicy to allow publishing dumps

https://gerrit.wikimedia.org/r/1140481

Change #1140508 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add an ssh known_hosts configmap to the mediawiki-dumps-legacy namespace

https://gerrit.wikimedia.org/r/1140508

Change #1140508 merged by jenkins-bot:

[operations/deployment-charts@master] Add an ssh known_hosts configmap to the mediawiki-dumps-legacy namespace

https://gerrit.wikimedia.org/r/1140508

I have had some success now with getting the ssh connection to work from a pod to a clouddumps server.
The pod spec that I am using is as follows:

btullis@deploy1003:~$ cat rsync-pod.yaml 
---
apiVersion: v1
kind: Pod
metadata:
  name: sync-pod-with-cephfs-volume
  namespace: mediawiki-dumps-legacy
  labels:
    component: sync-pod
    app: mediawiki-dumps-legacy
spec:
  containers:
    - name: sync-utils
      image: docker-registry.discovery.wmnet/repos/data-engineering/sync-utils:2025-04-30-180247-bcd800cfb637eba972605d66f935edb1075a1e2e
      command: ["/bin/sh", "-c"]
      args: ["tail -f /dev/null"]
      volumeMounts:
      - mountPath: /mnt/dumpsdata
        name: mediawiki-production-dumps
        readOnly: true
      - mountPath: /home/runuser/.ssh/id_rsa
        name: ssh-private-key
        subPath: dumpsgen.key
        readOnly: true
      - mountPath: /home/runuser/.ssh/known_hosts
        name: ssh-known-hosts
        subPath: known_hosts
        readOnly: true
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
          - ALL
        runAsNonRoot: true
  securityContext:
    fsGroup: 900
  volumes:
  - name: mediawiki-production-dumps
    persistentVolumeClaim:
      claimName: mediawiki-dumps-legacy-fs
  - name: ssh-private-key
    secret:
      secretName: ssh-private-key
  - name: ssh-known-hosts
    configMap:
      name: mediwiki-dumps-legacy-ssh-known-hosts

I now have a networkpolicy that allows access from this pod to the SSH port on clouddumps100[1-2].
There is also a configmap that is mounted as /home/runuser/.ssh/known_hosts and a secret key that is mounted to /home/runuser/.ssh/dumpsgen.key

I have discovered a problem with the from="10.67.24.0/21",from="2620:0:861:302::/64" parts of the authorized_keys file, because it wouldn't let me in with those in place.

Also, there is an issue with the command="/usr/bin/rsync --server -vloDtpr . /srv/mediawiki-dumps-legacy" section, so this will also need resolving.
In our configuration, we wist to be able to epecify exactly which directories are included for each sync task. Whereas the format above specifically uses this command, irrespective of which commands were sent.
I think that what we will have to do it to make use of the $SSH_ORIGINAL_COMMAND variable to do this.

Change #1140678 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix dumpsgen authorized_keys and remove chrootdirectory

https://gerrit.wikimedia.org/r/1140678

Change #1140678 merged by Btullis:

[operations/puppet@production] Fix dumpsgen authorized_keys and remove chrootdirectory

https://gerrit.wikimedia.org/r/1140678

I think that we can probably call this ticket done, since rsync over SSH is now working.
There will be some performance tuning that we can do, but I think that we are ready to start creating the sync tasks in the DAG.