Page MenuHomePhabricator

Create the base container images for running MediaWiki in a production environment
Closed, ResolvedPublic

Description

We will need at the very least:

  • apache httpd image. Will need to have the ability to add a complex configuration to it. This image can be based on debian buster. Unclear how we will want to manage logging.
  • php-fpm image. This should be relatively straightforward once we've figured out how to tackle T263545 correctly
  • mcrouter image.
  • nutcracker image - Hopefully the need for this will go away soon.
  • php/apache exporters images - this will need to be expanded and evaluated, but we surely need:
  • one php-fpm exporter. We have multiple right now and we should unify them somehow.
  • one apache status exporter
  • mtail to read the apache logs
  • mcrouter exporter image

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptOct 13 2020, 7:40 AM
JMeybohm triaged this task as Medium priority.Oct 13 2020, 9:53 AM
Joe moved this task from Backlog to In Progress on the MW-on-K8s board.

Change 634924 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Add apache httpd base image

https://gerrit.wikimedia.org/r/634924

Regarding the apache httpd container, I am approaching layering as follows:

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).
  • one image configured to manage a php-fpm application. This image will be used as a base for both MediaWiki and the shellout service. It will include all modules and base configurations we need, and have a single virtualhost sending all "*.php" files to the fastcgi daemon
  • the Wikimedia MediaWiki apache configuration will be injected on top of this base image using helm templating. To this end, we will first refactor our puppet code so that we can digest a yaml data structure to generate all the virtualhosts, and re-use the same data structure to configure things in helm, generating it from puppet. This way, we can keep managing changes during the transition mostly sane.

Does anyone see better approaches?

If I got this right you are purposing to put apache and php-fpm in the same container, correct (talking about vhosts in fpm context)?
I can think of reasons why that might make sense to do (sharing "static" assets for example, ease of MVP), but maybe you could outline them here?

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).

In case it makes things easier/cleaner, instead of modifying the configuration you could set the capability CAP_NET_BIND_SERVICE.

If I got this right you are purposing to put apache and php-fpm in the same container, correct (talking about vhosts in fpm context)?
I can think of reasons why that might make sense to do (sharing "static" assets for example, ease of MVP), but maybe you could outline them here?

No, the idea is to have separate containers, one for php-fpm, and one for apache httpd, just running in the same pod in production.

I was purely talking about the httpd configuration here. The idea is that e.g. the shellout service will be able to use this image as a base with little to no modifications.

The idea is that in production the two containers will have a shared volume that will be used to host the unix socket.

Oh, well. Sorry then. I guess I just misread the "one image configured to manage a php-fpm application." part as already containing fpm.

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).

In case it makes things easier/cleaner, instead of modifying the configuration you could set the capability CAP_NET_BIND_SERVICE.

You can't do that in your dockerfile, which means that people need to grant them when running the container, one way or another. It's an useless burden on the user IMHO.

Regarding the apache httpd container, I am approaching layering as follows:

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).
  • one image configured to manage a php-fpm application. This image will be used as a base for both MediaWiki and the shellout service. It will include all modules and base configurations we need, and have a single virtualhost sending all "*.php" files to the fastcgi daemon

How are we planning to solve some minor differences our clusters have in our php configurations? The ones I know for sure are max_execution_time (jobrunners) and apc.ttl = 10 (parsoid). Would use ENV work? Moreover, I think we would like to have the ability to easily tweak those settings

  • the Wikimedia MediaWiki apache configuration will be injected on top of this base image using helm templating. To this end, we will first refactor our puppet code so that we can digest a yaml data structure to generate all the virtualhosts, and re-use the same data structure to configure things in helm, generating it from puppet. This way, we can keep managing changes during the transition mostly sane.

To my understanding, in order to make apache changes, that would require a puppet commit, a helm commit, a puppet run and a helm apply, is that right?

Regarding the apache httpd container, I am approaching layering as follows:

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).
  • one image configured to manage a php-fpm application. This image will be used as a base for both MediaWiki and the shellout service. It will include all modules and base configurations we need, and have a single virtualhost sending all "*.php" files to the fastcgi daemon

How are we planning to solve some minor differences our clusters have in our php configurations? The ones I know for sure are max_execution_time (jobrunners) and apc.ttl = 10 (parsoid). Would use ENV work? Moreover, I think we would like to have the ability to easily tweak those settings

That's not going to be in the httpd configuration, and I guess we will just provide the php keys via helm to each deployment. but it's not relevant to the apache httpd layering.

To my understanding, in order to make apache changes, that would require a puppet commit, a helm commit, a puppet run and a helm apply, is that right?

If the change is just to rewrite rules or other general configurations (99.9% of our changes), this will mean we'll need to:

  • In the transition phase: make a puppet commit, a puppet run and a helm deployment
  • Afterwards: a helm commit and a helm deployment.

For more structural changes, that don't happen since ~ 1 year btw, we will need both a puppet and a helm commit for the time of the transition.

How are we planning to solve some minor differences our clusters have in our php configurations? The ones I know for sure are max_execution_time (jobrunners) and apc.ttl = 10 (parsoid). Would use ENV work? Moreover, I think we would like to have the ability to easily tweak those settings

That's not going to be in the httpd configuration, and I guess we will just provide the php keys via helm to each deployment. but it's not relevant to the apache httpd layering.

I was referring to the php-fpm image in general :)

To my understanding, in order to make apache changes, that would require a puppet commit, a helm commit, a puppet run and a helm apply, is that right?

If the change is just to rewrite rules or other general configurations (99.9% of our changes), this will mean we'll need to:

  • In the transition phase: make a puppet commit, a puppet run and a helm deployment
  • Afterwards: a helm commit and a helm deployment.

I understand that it is not going to be easy and I do not have a better answer now, but I believe going through this 5-6 step process, will be nerve breaking, lengthy, and hard to roll back.

Right now it is: a cumin run, a puppet commit, a puppet run, and a couple of cumin commands, and it can take up to 15-30'. We also have a safety net where, if our change is about to break production, most of the times we avoid that, since we simply do not re-enable puppet, but rarther revert and merge. We will need this safety net too.

For more structural changes, that don't happen since ~ 1 year btw, we will need both a puppet and a helm commit for the time of the transition.

Overall, I think we may need to take one step back and consider if an apache container in the pod, is something that might cause more problems than it will solve.

Overall, I think we may need to take one step back and consider if an apache container in the pod, is something that might cause more problems than it will solve.

I'm not sure I understand what alternative approach are you proposing.

Also I want to clarify: we can reduce the pain as much as possible, but for the duration of the transition phase, it will be somewhat more work than we're used to for this kind of changes. There is no way around that that I can think of.

Hopefully we'll be able to reduce the amount of time needed as much as possible.

I don't think "you also need to do a deploy to k8s" is a bad compromise, but at this point I'd like to hear from others.

Also I want to clarify: we can reduce the pain as much as possible, but for the duration of the transition phase, it will be somewhat more work than we're used to for this kind of changes. There is no way around that that I can think of.

Hopefully we'll be able to reduce the amount of time needed as much as possible.

I don't think "you also need to do a deploy to k8s" is a bad compromise, but at this point I'd like to hear from others.

I had not completely understood the process, but I do now. The amount of time to deploy a change could be concerning, but it does not appear we have other options.

Change 638095 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Add base php cli image

https://gerrit.wikimedia.org/r/638095

Change 640386 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Add a php-fpm image for php 7.2

https://gerrit.wikimedia.org/r/640386

Change 634924 merged by Giuseppe Lavagetto:
[operations/docker-images/production-images@master] Add apache httpd base image

https://gerrit.wikimedia.org/r/634924

Change 638095 merged by Giuseppe Lavagetto:
[operations/docker-images/production-images@master] Add base php cli image

https://gerrit.wikimedia.org/r/638095

Change 640386 merged by Giuseppe Lavagetto:
[operations/docker-images/production-images@master] Add a php-fpm image for php 7.2

https://gerrit.wikimedia.org/r/640386

Change 641331 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Add a mcrouter image

https://gerrit.wikimedia.org/r/641331

Change 643019 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Add LAMP prometheus exporters.

https://gerrit.wikimedia.org/r/643019

Change 641331 merged by Giuseppe Lavagetto:
[operations/docker-images/production-images@master] Add a mcrouter image

https://gerrit.wikimedia.org/r/641331

Change 643019 merged by Giuseppe Lavagetto:
[operations/docker-images/production-images@master] Add LAMP prometheus exporters.

https://gerrit.wikimedia.org/r/643019

Mentioned in SAL (#wikimedia-operations) [2020-11-24T08:49:26Z] <_joe_> uploading the base production docker images for MediaWiki, T265324

Joe updated the task description. (Show Details)