Page MenuHomePhabricator

mwdeploy does not have the same user ID on all Apaches
Open, MediumPublic

Description

mwdeploy is user ID 111 on most machines, including fenari, but 112 on some
others (e.g. srv167). On those machines, user ID 111 is nagios. This causes
problems sometimes when rsync works by user ID not by user name (not entirely
sure when that happens) and other things run inside sudo -u mwdeploy and expect
to be able to touch those files. mwdeploy should have the same user ID
everywhere.

Details

Reference
rt1406

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 12:55 AM
rtimport added a project: ops-core.
rtimport set Reference to rt1406.
Catrope created this task.Aug 31 2011, 8:44 PM
Dzahn added a comment.Sep 1 2011, 12:51 PM

This reminded me of similar problems (nagios UID vs. other) we had in a former
company, to avoid these we had a wiki page with "reserved" UIDs and GIDs people
would add to when creating new users.
So i created one here: http://wikitech.wikimedia.org/view/UID
This just reflects the current situation on fenari, it should be edited to the
way it _should_ be on all servers. Feel free to edit it.
The table columns are sortable, btw.

Dzahn added a comment.Sep 1 2011, 12:51 PM

Status changed from 'new' to 'open' by dzahn

-- Where does the user come from because I cannot find it in puppet?
Chris Johnson
Wikimedia Foundation, Inc

Όταν Πεμ Φεβ 28 23:50:02 2013, cmjohnson γράψε:

  • Where does the user come from because I cannot find it in puppet?

Chris Johnson
Wikimedia Foundation, Inc

modules/mediawiki/manifests/users/mwdeploy.pp
which invokes modules/generic/manifests/systemuser.pp
It does not take a uid parameter.
Even on just the mw* hosts in eqiad I see 109, 110, 108...

On Mon Dec 09 13:30:15 2013, ariel wrote:

Όταν Πεμ Φεβ 28 23:50:02 2013, cmjohnson γράψε:

  • Where does the user come from because I cannot find it in puppet?

Chris Johnson
Wikimedia Foundation, Inc

modules/mediawiki/manifests/users/mwdeploy.pp
which invokes modules/generic/manifests/systemuser.pp
It does not take a uid parameter.

Even on just the mw* hosts in eqiad I see 109, 110, 108...

From IRC:
<apergos> have to see what might be owned by that uid on various hosts and fix
that, see what processes are running as that user and decide what would need to
be done
<apergos> if there are any (files, processes)
<apergos> would want to make sure nothing else has the new uid on any of the
hosts too
So, we must find a free uid and make sure it is ok to reassign the mwdeploy to
that UID

200 looks like a nice round uid (and not in use anywhere at this writing).
I was hoping to reuse one of the mwdeploy uids but they are snagged by other
things on various servers so no deal.
Checking for running processes (I did not check for cron): hume has some job
queue related things, terbium has some wikidata dispatch stuff going.
Home dir:
vanadium and kaulen have the home dir as /home/mwdeploy instead of
/var/lib/mweploy like everywhere else.
The following hosts have /var/lib/mwdeploy directories (with proper ownership):
virt0, virt1000, mw118, stat1002, stat1, bast1001
fenari, srv193, nfs1, hume, and terbium all have /home/mwdeploy directories,
not necessarily owned by the right user:
Files owned:
/usr/local/apache/common-local where present is owned by the mwdeploy user.
todo: find cron jobs that run as the user, track down crucial files outside of
/usr/local/apache/common-local owned by the user on all hosts

Dzahn added a comment.Feb 12 2014, 4:33 PM

On Tue Feb 11 09:20:54 2014, matanya wrote:

So, we must find a free uid and make sure it is ok to reassign the
mwdeploy to
that UID

please don't just "find" one, please see here instead:
https://wikitech.wikimedia.org/wiki/UID
of course the wiki page can and should be edited, but it should reflect the
real situation on the servers as it should be

is still an issue? mwdeploy user has different UIDs across the cluster now, but IMO that shouldn't matter and we should make sure to do everything with usernames

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJul 22 2015, 4:27 PM
fgiunchedi changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".Jul 22 2015, 4:28 PM
fgiunchedi changed the edit policy from "WMF-NDA (Project)" to "All Users".
fgiunchedi set Security to None.
bd808 added a subscriber: bd808.Nov 20 2015, 4:34 PM

is still an issue? mwdeploy user has different UIDs across the cluster now, but IMO that shouldn't matter and we should make sure to do everything with usernames

The one place I can think of today where it may matter is in our efforts to establish multi-master sync for the MediaWiki deploy server (eg tin & mira). If mwdeploy owns files in the staging directory (/srv/mediawiki-staging) and the uids for mwdeploy don't match on tin and mira (and any new masters we add later) then things may get weird when the root permission rsync mirrors the stageing dir from one host to the others.

IIRC rsync as root will try to DTRT and map name/uid on the destination side with what it found on source side (unless --numeric-ids is used). Have you seen differently when syncing tin and mira?

bd808 added a comment.Nov 20 2015, 5:00 PM

IIRC rsync as root will try to DTRT and map name/uid on the destination side with what it found on source side (unless --numeric-ids is used). Have you seen differently when syncing tin and mira?

Yes, see T119165: l10nupdate user uid mismatch between tin and mira where a sync as root from tin to mira has kept the uid of the origin files on tin and thus transferred ownership of some files from the l10nupdate user (tin) to the trebuchet user (mira).

bd808 added a comment.Nov 25 2015, 6:26 PM

This causes problems sometimes when rsync works by user ID not by user name (not entirely sure when that happens) and other things run inside sudo -u mwdeploy and expect to be able to touch those files.

I figured the cause of this out in https://phabricator.wikimedia.org/T119165#1825437. The TL;DR answer is that rsync forces --numeric-ids when fetching content from an rsyncd configured with use chroot = yes. There is a way to force it back to using name-based syncs but it requires having /etc/passwd and /etc/group inside the chroot and an additional configuration flag (numeric ids = no).