Page MenuHomePhabricator

l10nupdate user uid mismatch between tin and mira
Closed, ResolvedPublic

Description

Apparently the l10nupdate user was assigned an inconsistent uid across tin and mira. This causes problems with scap, e.g. P2335

From IRC:

<@bd808> drwxr-xr-x 3 trebuchet l10nupdate 4096 Nov 17 20:43 /srv/mediawiki-staging/php-1.27.0-wmf.7/cache/l10n/
the uids don't match across the hosts:

on tin : uid=997  (l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)
on mira: uid=12162(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

so the rsync worked perfectly but the hosts have lame configuration because we let puppet randomly pick uids
it shouldn't be too hard for a root to fix puppet and renumber the uids on mira

Event Timeline

mmodell raised the priority of this task from to Needs Triage.
mmodell updated the task description. (Show Details)
mmodell added projects: Deployments, SRE.
mmodell added subscribers: mmodell, bd808.

Puppet says that l10nupdate should have gid=10002 which the hosts both agree on, but the uid is left unspecified.

This problem is only showing up because we are now trying to exactly mirror the contents of /srv/mediawiki-staging between tin and mira to establish multi-master redundancy for scap (T104826).

see also: T79786: mwdeploy does not have the same user ID on all Apaches

bd808 renamed this task from uid mismatch between tin and mira to l10nupdate user uid mismatch between tin and mira.Nov 20 2015, 4:46 PM
bd808 set Security to None.

These mismatches are only really a problem when root gets involved. It doesn't matter what the uid/gids are until you start using rsync/tar/whatever as root to make exact copies of a directory from system to system. Rsync as a no privileged user is fine because that's won't try to preserve the uid/gid from the origin server. Unfortunately if we are going to do master-master sync then we need to have them match on both sides.

@bd808: so, reading the manpage for rsync, it seems that you have to specify --numeric-ids for this to even matter? Otherwise rsync should be smart enough to remap the ids automatically?

I can assert that we are currently seeing uid preservation when rsycning from tin to mira. Checking the ownership of /srv/mediawiki-staging/php-1.27.0-wmf.7/cache/l10n on the two hosts will show that.

The rsync(1) man page does say that name based mappings are done by default, but there is also this little side note for both the --owner and --group flags (which are implied by the --archive flag we specify:

The preservation of ownership will associate matching names by default, but may fall back to using the ID number in some circumstances (see also the --numeric-ids option for a full discussion).

Under --numeric-ids it says:

If a user or group has no name on the source system or it has no match on the destination system, then the numeric ID from the source system is used instead. See also the comments on the "use chroot" setting in the rsyncd.conf manpage for information on how the chroot setting affects rsync’s ability to look up the names of the users and groups and what you can do about it.

We have use chroot = yes in /etc/rsyncd.conf so let's follow that thread over to rsyncd.conf(5) where we find:

When this parameter is enabled, rsync will not attempt to map users and groups by name (by default), but instead copy IDs as though --numeric-ids had been specified. In order to enable name-mapping, rsync needs to be able to use the standard library functions for looking up names and IDs (i.e. getpwuid() , getgrgid() , getpwname() , and getgrnam() ). This means the rsync process in the chroot hierarchy will need to have access to the resources used by these library functions (traditionally /etc/passwd and /etc/group, but perhaps additional dynamic libraries as well).

If you copy the necessary resources into the module’s chroot area, you should protect them through your OS’s normal user/group or ACL settings (to prevent the rsync module’s user from being able to change them), and then hide them from the user’s view via "exclude" (see how in the discussion of that parameter). At that point it will be safe to enable the mapping of users and groups by name using the "numeric ids" daemon parameter (see below).

Under "numeric ids":

By default, this parameter is enabled for chroot modules and disabled for non-chroot modules.

A chroot-enabled module should not have this parameter enabled unless you’ve taken steps to ensure that the module has the necessary resources it needs to translate names, and that it is not possible for a user to change those resources.

So it sounds like if we want to get the traditional name based mapping from rsync we will need to:

  1. Symlink /etc/passwd and /etc/group into the chroot used by rsyncd,
  2. Ensure that those symlinks are chmod 0444 or similar.
  3. Exclude the passwd and group files from syncing in the rsyncd config.
  4. Add the numeric ids = no configuration option for rsyncd to disable the automatic use of forced --numeric-ids that having chroot enabled sets.
  5. Test and tweak until we are sure we got it all fixed correctly.

Or we could pin the uid of the l10nupdate user in the puppet config to a self-chosen well known value and fix the inode ownership on tin & mira to match.

@bd808: So it sounds like having consistent UIDs is the better / easier solution, based on the complexity of getting name mapping to work?

thanks @bd808 for the investigation! indeed as @mmodell points out it seems simpler to reserve an uid and have those matching on tin/mira given the above

Let's pick the UID that we have on tin and make it "reserved" by adding it here:

https://wikitech.wikimedia.org/wiki/UID

then i'd change the UID on mira and use find / -uid ..-exec chown .. foo to fix all the file ownerships.

ok, so we have defined that it is supposed to be:

10002/10002

the actual situation is:

tin: 997/10002

mira: 12162/10002

so we have to fix both servers

I fixed it on mira by editing /etc/passwd and then running @mira:~# find / -uid 12162 -exec chown 10002:10002 {} \;
, then run it a second time with chown -h. to also fix the symlink ownership.

root@mira:~# id l10nupdate
uid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

and it does not find any files owned by 12162 anymore (besides /proc remnants)

running the fix on tin .. in a screen because it was still ongoing ...

done on tin.

now:

root@tin:~# id l10nupdate
uid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

and:

root@tin:~# find / -uid 997 -exec chown 10002:10002 {} \;
root@tin:~# find / -uid 997 -exec chown -h 10002:10002 {} \;

(-h also does the symlinks, you can't chown links by default)

chown() changes the ownership of the file specified by path, which is dereferenced if it is a symbolic link.

-h, --no-dereference
       affect each symbolic link instead of any referenced file (useful only on systems that can change the ownership of a symlink)
Dzahn claimed this task.

root@tin:~# id l10nupdate
uid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

root@mira:~# id l10nupdate
uid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

tin:/srv/mediawiki-staging  (git master $)
bd808$ sync-file README "Testing l10nupdate uid fix for T119165"
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

18:18:54 Started sync-masters
sync-masters: 100% (ok: 1; fail: 0; left: 0)
18:19:03 Finished sync-masters (duration: 00m 09s)
18:19:03 Started sync-proxies
sync-proxies: 100% (ok: 12; fail: 0; left: 0)
18:19:06 Finished sync-proxies (duration: 00m 02s)
18:19:06 Started sync-apaches
sync-common: 100% (ok: 467; fail: 0; left: 0)
18:19:22 Finished sync-apaches (duration: 00m 16s)
18:19:22 Synchronized README: Testing l10nupdate uid fix for T119165 (duration: 00m 28s)

Change 255421 had a related patch set uploaded (by Dzahn):
mediawiki: specify uid 10002 for l10nupdate user

https://gerrit.wikimedia.org/r/255421

Change 255421 abandoned by Dzahn:
mediawiki: specify uid 10002 for l10nupdate user

https://gerrit.wikimedia.org/r/255421

greg subscribed.

This is back:
21:36 < hashar> mira has l10nupdate uid == 10002 tin has l10nupdate uid = 1001

Following reinstallation of tin on Feb 2nd:

mirauid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)
tinuid=1001(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

That prevents scap sync-masters to copy the l10n cache files from mira to tin :(

So @ori had a patch https://gerrit.wikimedia.org/r/#/c/256026/4/modules/scap/manifests/l10nupdate.pp,cm but that does not show up anymore in puppet:

modules/scap/manifests/l10nupdate.pp has:

user { 'l10nupdate':
     ensure     => present,
     gid        => 10002,
     shell      => '/bin/bash',
     home       => '/home/l10nupdate',
     managehome => true,
 }

So UID is added by the system :-(

I went to tin and adjusted the UID of the l10nupdate user, after confirming 10002 is correct on https://wikitech.wikimedia.org/wiki/UID

vi /etc/passwd
find / -uid 1001 -exec chown 10002 {} \;

the GID is puppetized and was already correct, 10002, before that

same on mira (and tin) for:

find /srv/mediawiki-staging/ -uid 1001 -exec chown 10002 {} \;

on mira the user was ok, just the files had to be fixed from the rsync from tin

root@mira:/srv# id l10nupdate
uid=10002(l10nupdate) gid=10002(l10nupdate) groups=10002(l10nupdate)

the immediate blocker is fixed.

for the puppetization issue see comments on https://gerrit.wikimedia.org/r/#/c/255421/

Just for the record, cron needs to be restarted if an uid has been changed, this made the l10nupdate job fail for days in a row

Just for the record, cron needs to be restarted if an uid has been changed, this made the l10nupdate job fail for days in a row

Yes, absolutely right and i should have known from last time we did the same thing, just forgot doing it. :/ thanks!