As of today, all the hosts/vms of the DE infrastructure should be on Debian Buster. A big work that was done while reimaging was to move to a fixed uid/gid scheme for most of our users (hdfs, yarn, analytics, etc..). See https://gerrit.wikimedia.org/r/c/operations/puppet/+/666657 for an example of how we allocated fixed uid/gid.
In this task we should:
- verify that all the users that we allocated in data.yaml are effectively deployed across all nodes (namely that we have the same uid/gid everywhere for say druid, hdfs, analytics-privatedata, etc..)
- deprecate profile::analytics::cluster::users in favor of data.yaml (basically remove the profile and uncomment what written in data.yaml)
Note: allocating users via data.yaml means that they will be deployed across all the hosts managed by puppet (even non analytics ones). This should be fine but it would be good to verify this with John and Moritz beforehand, just to be sure.
I have not totally 'deprecated' profile::analytics::cluster::users. Instead, it should now only be used to sync system users declared in puppet classes (not the admin module) to all nodes where they are needed. Puppet classes should maintain control over the daemon system users they need.
'Human' system users exist for human users to sudo to and schedule productionized jobs as a shared user. These are controlled by the admin module and have been removed from profile::analytics::cluster::users.