Page MenuHomePhabricator

Make www-data the web-serving user (is currently apache)
Closed, ResolvedPublic

Description

Normalize the uid/gid of the apache user on all beta hosts to the "standard" 48/48 used on the older Precise instances and in WMF production.

This seems to only be necessary on deployment-mediawiki0[1-3].

$ sudo salt '*' cmd.run 'hostname;id apache;lsb_release -d'
## hosts responding "id: apache: no such user" removed from output
i-000001dc.eqiad.wmflabs:
    deployment-upload
    uid=48(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 12.04.5 LTS
i-0000022e.eqiad.wmflabs:
    deployment-jobrunner01
    uid=48(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 12.04.5 LTS
i-0000059b.eqiad.wmflabs:
    deployment-mediawiki03
    uid=997(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 14.04.1 LTS
i-0000044e.eqiad.wmflabs:
    deployment-mediawiki01
    uid=997(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 14.04.1 LTS
i-0000010b.eqiad.wmflabs:
    deployment-bastion
    uid=48(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 12.04.5 LTS
i-000004ba.eqiad.wmflabs:
    deployment-mediawiki02
    uid=997(apache) gid=48(apache) groups=48(apache)
    Description:        Ubuntu 14.04.1 LTS

Related Objects

Event Timeline

bd808 raised the priority of this task from to High.
bd808 updated the task description. (Show Details)
bd808 changed Security from none to None.
bd808 added subscribers: Unknown Object (MLST), scfc, greg and 7 others.

Change 178690 had a related patch set uploaded (by BryanDavis):
Ensure that apache's uid=48

https://gerrit.wikimedia.org/r/178690

Patch-For-Review

Running this on each of the effected servers should work I think:

sudo puppet agent --disable "renumbering apache user"
sudo service apache2 stop
sudo service hhvm stop
ps aux | grep apache
sudo usermod -u 48 apache
sudo find / -fstype nfs -prune -o \( -user 997 -print -exec chown -h 48 {} + \)
sudo service hhvm start
sudo service apache2 start
sudo puppet agent --enable

Since this will stop the app server during the migration an outage should be scheduled. If we had pybal in labs we could depool the server instead but yeah, we don't.

On deployment-jobrunner01 , you might want to stop the jobrunner /usr/bin/php /srv/deployment/jobrunner/jobrunner/redisJobRunnerService

It seems to have upstart support, just lacking a symlink under /etc/init.d/ (filled T78126 for it).

I have announced on labs-l, qa-l and engineering-l my intent to make these changes between 15:00Z & 18:00Z on Saturday 2014-12-13.

Renumbering is done and the puppet patch is applied on deployment-salt via cherry-pick.

bd808 changed the task status from Open to Stalled.Dec 13 2014, 5:37 PM
bd808 added a project: acl*sre-team.
bd808 added a subscriber: yuvipanda.

Help from SRE is needed to merge the patch into rOPUP Wikimedia Puppet before this can be closed. @yuvipanda and @Andrew have been added as reviewers.

Dzahn renamed this task from Renumber apache user/group to uid=48 on Trusty beta hosts to Renumber apache user/group to uid=48.Dec 22 2014, 1:41 PM
Dzahn added a subscriber: Dzahn.

renamed task. the patch changes this for ALL mediawiki hosts. i don't see how it just affected beta or just trusty. in production ALL apaches also have the wrong uid. also, the goal should be that prod and beta are the same.

renamed task. the patch changes this for ALL mediawiki hosts. i don't see how it just affected beta or just trusty. in production ALL apaches also have the wrong uid. also, the goal should be that prod and beta are the same.

The Precise hosts in beta and production provisioned the apache user via the wikimedia-task-appserver package (https://github.com/wikimedia/operations-debs-wikimedia-task-appserver/blob/master/debian/postinst#L28-L3). This package was removed by @ori in https://gerrit.wikimedia.org/r/#/c/136151/ on 2014-05-29 which left the uid of the apache user unspecified for all subsequent hosts provisioned by Puppet. This means that any host (Trusty or Precise) that has been imaged/reimaged since 2014-05-29 will have used the next available non-system uid to provision the apache user.

Other than the uid not matching the documentation on wikitech, this should not have any negative effects I can think of in production. In beta however the use of shared storage for image uploads and thumbnails makes it highly desirable for the apache user to have the same uid across all hosts.

I'm not sure what the "right" fix for this is at this point. It is technically possible to renumber the uid on how ever many (hundreds?) of production hosts that are not using uid=48 but I'm not sure if anyone thinks this is worth the effort or potential risks. I do want to keep the current patch applied in beta as a cherry-pick to avoid regressing on the shared file system permissions.

This package was removed by @ori in https://gerrit.wikimedia.org/r/#/c/136151/ on 2014-05-29 which left the uid of the apache user unspecified for all subsequent hosts provisioned by Puppet. This means that any host (Trusty or Precise) that has been imaged/reimaged since 2014-05-29 will have used the next available non-system uid to provision the apache user.

Not entirely true. The Puppet user / group resources specify UID / GID, so it is still consistently 48 / 48, even on new installs.

We shouldn't care about the exact numeric UID / GID -- this was done for compatibility with existing machines. We should move toward being agnostic about them.

In T78076#966266, @ori wrote:>

so it is still consistently 48 / 48, even on new installs.

this is not the case. mw1033:

id apache
uid=996(apache) gid=48(apache) groups=48(apache)

Why is this needed again? T76086 seems to have fixed T75206. And as @ori said, we should be agnostic about the numeric values unless there's a very good reason.

Why is this needed again? T76086 seems to have fixed T75206. And as @ori said, we should be agnostic about the numeric values unless there's a very good reason.

All of the hosts in beta need to agree on the uid of the apache user so that when an image is uploaded to one host it can be seen and served by all. Also the hashed directories that are created need to be writable by all hosts. This is a consequence of the use of NFS for shared storage in beta. Ori is right that it doesn't really matter what the specific uid is and in production where there is no shared storage usage it really doesn't matter if hostA and hostB agree on the uid for any particular user.

If you want to kill this need in beta, we need someone to implement T64835: Setup a Swift cluster on beta-cluster to match production to remove the shared storage usage for images.

Why is this needed again? T76086 seems to have fixed T75206. And as @ori said, we should be agnostic about the numeric values unless there's a very good reason.

All of the hosts in beta need to agree on the uid of the apache user so that when an image is uploaded to one host it can be seen and served by all. Also the hashed directories that are created need to be writable by all hosts. This is a consequence of the use of NFS for shared storage in beta. Ori is right that it doesn't really matter what the specific uid is and in production where there is no shared storage usage it really doesn't matter if hostA and hostB agree on the uid for any particular user.

If you want to kill this need in beta, we need someone to implement T64835: Setup a Swift cluster on beta-cluster to match production to remove the shared storage usage for images.

Perhaps we can look at setting up a minimal Swift cluster inside Labs (inside VMs) to satisfy requirements for Beta, and worry about a proper Swift cluster with support for other projects later.

Consensus at this point seems to be that renumbering production is not feasible. Should we leave the patch applied only in beta for now, close this task and instead focus on eliminating shared storage in beta as the proper long term solution?

I've been talking to @faidon and @Joe about this over the last few days, hopefully we'll find a way to fix this before end of coming week.

I'll let @faidon elaborate, but I think we're going to re-number in prod and also try to explicitly set uid/gid for all system users declared in puppet.

As an update, I think @faidon and @Joe are working on moving our apache user to just use www-data instead (in prod).

apache right now has no uid (so all kinds of uid across the fleet) and a gid 48, which is < 100 and thus, wrong (that space is reserved for packages).

Rather than renumbering both uid/gid for apache, we can just switch to www-data which is provided in base-files for all Debian/Ubuntu systems with the same uid. The use of the apache user is a historical artifact and there are comments in the puppet tree that are basically FIXMEs for this.

In the end, it's probably a little more work but a cleaner solution overall.

apache right now has no uid (so all kinds of uid across the fleet) and a gid 48, which is < 100 and thus, wrong (that space is reserved for packages).

Rather than renumbering both uid/gid for apache, we can just switch to www-data which is provided in base-files for all Debian/Ubuntu systems with the same uid. The use of the apache user is a historical artifact and there are comments in the puppet tree that are basically FIXMEs for this.

In the end, it's probably a little more work but a cleaner solution overall.

+1. I'm pretty sure the use of apache as the content serving user is from a time before www-data became the default APACHE_RUN_USER for apache2. It wouldn't surprise me to find out that that change happened with the migration from Apache 1.3.x to Apache 2.0.x. Changing the user to www-data will effect apache2, hhvm, parsoid, jobrunner and MWMultiVerison settings at least, but we run as the www-data user in MediaWiki-Vagrant so I'm pretty sure it is possible.

per https://wikitech.wikimedia.org/wiki/UID using www-data means we want uid/gid 33/33 (not 48/48 or random:48)

I think apache user might have been in use since before we even used Ubuntu. Like on Fedora...

I have manually converted one host to use www-data, and took it out of rotation. A smoke test shows it working correctly. To make the conversion happen, I created a series of puppet patches that will help with this migration https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:www_data_on_mw,n,z

I will track on this ticket the progress of the migration in production, but we should probably migrate beta as well, maybe before.

yuvipanda renamed this task from Renumber apache user/group to uid=48 to Make www-data the web-serving user (is currently apache).Feb 3 2015, 9:15 AM

Change 187687 had a related patch set uploaded (by Yuvipanda):
maintenance: allow choosing the web user

https://gerrit.wikimedia.org/r/187687

Patch-For-Review

Change 187686 had a related patch set uploaded (by Yuvipanda):
labstore: do not explicitly declare the apache user existence

https://gerrit.wikimedia.org/r/187686

Patch-For-Review

Change 187259 had a related patch set uploaded (by Yuvipanda):
mediawiki: allow using a different web user than apache

https://gerrit.wikimedia.org/r/187259

Patch-For-Review

Change 187688 had a related patch set uploaded (by Yuvipanda):
beta: allow defining the web user.

https://gerrit.wikimedia.org/r/187688

Patch-For-Review

Retitled to point to current solution.

I tested the patches (with a fix) on deployment-mediawiki04, and found that they were noops (yay!)

Change 187259 merged by Giuseppe Lavagetto:
mediawiki: allow using a different web user than apache

https://gerrit.wikimedia.org/r/187259

Change 187686 merged by Giuseppe Lavagetto:
labstore: do not explicitly declare the apache user existence

https://gerrit.wikimedia.org/r/187686

Change 187687 merged by Giuseppe Lavagetto:
maintenance: allow choosing the web user

https://gerrit.wikimedia.org/r/187687

Change 187688 merged by Giuseppe Lavagetto:
beta: allow defining the web user.

https://gerrit.wikimedia.org/r/187688

Change 188791 had a related patch set uploaded (by Yuvipanda):
beta: Make web user be www-data instead of apache

https://gerrit.wikimedia.org/r/188791

Patch-For-Review

Change 188791 merged by Yuvipanda:
beta: Make web user be www-data instead of apache

https://gerrit.wikimedia.org/r/188791

Change 188798 had a related patch set uploaded (by Yuvipanda):
Make scap do things as www-data user instead of apache user

https://gerrit.wikimedia.org/r/188798

Patch-For-Review

Handing this off to @yuvipanda as the owner of seeing the change through. Thanks Yuvi!

So deployment-prep is running everything as www-data (except videoscaler01, but that has never worked anyway, need to be fixed). /data/project/upload7 is also owned by www-data now, and everything works as it should (according to shinken and my limited testing).

I re-imaged our mediawiki and jobrunner servers yesterday to match that of production, and after testing this change in beta we (me and @Joe) are a lot more confident of doing this in prod (should happen next week). Yay for testing things in beta now :)

P262 contains commands that were used during the migration

Really nice! I am very happy to see beta cluster being used for such staging work \O/

Change 178690 abandoned by BryanDavis:
Ensure that apache's uid=48

Reason:
Yuvi and Giuseppe have fixed up beta so that this is not needed by switching to www-data as the wiki runtime user and are planning to continue to roll those changes across production.

https://gerrit.wikimedia.org/r/178690

Joe changed the task status from Stalled to Open.Feb 17 2015, 8:29 AM
Joe claimed this task.

Production is being finally converted to www-data today; we will need to deploy the scap fix before we deploy today. I'll ping @bd808

Change 188798 merged by jenkins-bot:
Make scap do things as www-data user instead of apache user

https://gerrit.wikimedia.org/r/188798

Krenair added a subscriber: Krenair.

Some things like mwscript still use apache, and this is now broken on deployment-prep. Please see T89802

Change 506750 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admins: remove ability to run commands as user 'apache'

https://gerrit.wikimedia.org/r/506750

Change 506750 merged by Dzahn:
[operations/puppet@production] admins: remove ability to run commands as user 'apache'

https://gerrit.wikimedia.org/r/506750

The ability to run commands as the 'apache' user has been removed from prod admins module sudo privileges today.