Docker dev setup: Set up process control
Open, LowPublic
Actions

Assigned To

None

Authored By

	AndyRussG
	Nov 24 2020, 9:25 PM

Description

Create a Dockerfile for process control

Update 09/02/2021
This ticket was originally for Dockerizing process control but then became the place for discussing all things Docker.

Details

	Subject	Repo	Branch	Lines +/-
	Dockerize process-control	wikimedia/fundraising/process-control	master	+98 -4

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		AKanji-WMF	T262971 Epic: feature parity with vagrant on docker
		Open		None	T268685 Docker dev setup: Set up process control

Event Timeline

AndyRussG created this task.Nov 24 2020, 9:25 PM

• DStrine moved this task from Triage to Current Sprint on the Fundraising-Backlog board.Nov 24 2020, 9:25 PM

• DStrine added a project: Fundraising Sprint Xtreme Lolcats.Nov 24 2020, 9:33 PM

• DStrine triaged this task as Low priority.Nov 24 2020, 10:00 PM

jgleeson claimed this task.Nov 25 2020, 11:47 AM

jgleeson moved this task from Backlog to Doing on the Fundraising Sprint Xtreme Lolcats board.

jgleeson updated the task description. (Show Details)Nov 25 2020, 4:56 PM

Change 643767 had a related patch set uploaded (by Jgleeson; owner: Jgleeson):
[wikimedia/fundraising/process-control@master] WIP: Dockerize process-control

https://gerrit.wikimedia.org/r/643767

gerritbot added a project: Patch-For-Review.Nov 26 2020, 7:55 PM

Some notes on this task:

I see this work breaking into two parts.

1. Dockerizing the process-control application
This will allow fr-tech and others using process-control (e.g. @awight ) to develop and test process-control using containers. To do this we will add a process-control Dockerfile to the project that allows developers to create an image which includes all the required packages and set up for process-control to run for development. I've added a patch for that here

2. Integrate the newly dockerized process-control app into our fundraising-dev project to run alongside the other containerized apps in the stack, using Docker-Compose

How it runs in production:
Currently, in production, civi1001 uses process-control is two ways:

To generate a crontab from the process-control job file specs using the cron-generate command. The job spec files define within them a command to be run, the cron schedule and a few other parameters. They are then transformed into crontab lines, with each command calling back to the jobspec file via run-job e.g.:

# Generated from /srv/jobs/example_job.yaml
*/5 * * * * root /usr/local/bin/run-job --job example_job

To run commands via the run-job wrapper, either by crond running a scheduled command from the crontab or via a user on the box calling run-job --job example_job on the cli.

Process-control is currently installed as a Python package on civi1001 box. civi1001 has a directory of job specs that contain commands which call Drupal, CiviCRM and fundraising-tools (also lives on civi1001) processes. This allows the scheduling of commands that live across these projects such as in donations_queue_consume.yaml which uses Drupal and silverpop_daily.yaml. which uses fundraising-tools.

Options for development setup:
Now as we are separating out Drupal/CiviCRM & fundraising tools into their respective containers in accordance with Docker conventions, this will pose some challenges using process-control in the same way we do today on civi1001, specifically in how we currently manage jobs within a single directory which can then include commands across multiple applications.

Option A - To retain our current workflow we could create a new monolithic dockerfile that builds an all-encompassing image which includes all the required and configured applications such as Drupal, CiviCRM and fundraising-tools alongside an install of the process control python package and crond. This approach should allow us to retain a similar workflow to what we have in production although it does come at the cost of likely having to manage the monolith image alongside the individual application images potentially adding a duplicated maintenance burden going forward though we should be able to optimise this if we choose this path.

Option B (preferred) - Another option would be separate out process-control jobs for each containerized application when run in a development environment. This would allow us to generate crontabs for each respective container's job files. We could achieve this a few different ways which will depend largely on where we end up creating new process-control job spec files, most likely in a shared 'src' or 'config' directory either as a checkout of the current process-control jobs (if we're ok with those being public) or an empty-by-default directory which we add new jobs to and maybe drop in an example job during the build to make this easier for fr-tech developers to work with.

To generate a crontab for jobs which call processes running on the new civicrm container in a one-liner, we could run something like:

docker run -v '/src/process-control-data/civicrm/jobs:/srv/jobs' -v '/src/process-control-data/civicrm/crontab:/etc/cron.d' process-control cron-generate

and for fundraising-tools jobs and crontab we'd use:

docker run -v '/src/process-control-data/tools/jobs:/srv/jobs' -v '/src/process-control-data/tools/crontab:/etc/cron.d' process-control cron-generate

This is is assuming we'd check out and manage process-control job inside '/src/process-control-data'. It might make more sense for these jobs to belong within the src of the actual applications they relate to keep things together.

We'd then include the process-control as a python package and crond (if we actually want the jobs to run on a schedule) as package dependencies for the individual application images so that when a container for these applications is run, they have access to the run-job utilities and cron if needed.

Thanks so much for digging in and for the explanations!!!! What about the following as Option C:

We have two generic base images:
1. One with PHP and Apache, and whatever PHP extensions and Apache modules we'd normally install on prod via apt.
2. Another with PHP, Python and cron, and whatever PHP extensions we'd normally install on prod via apt.
All source code and all app-specific config is stored on the host, under the src and config directories, so none of it is baked into the images via Dockerfiles. This includes Civi, Drupal, process-control, crontabs, Smashpig config, process-control yaml files, and tools, as well as all config generated by tools (e.g. Civi BuildKit).
All source code and all app-specific config is shared inside the containers using bind mounts.
I think Docker convention is to use a separate container for each process that runs continuously (though it might spawn other processes), which is not the same as each application (especially in our case, since the applications themselves would reside on the host). So, as per this convention, apache would be the main process for the first container, and crond would be the main process for the second container. crond would spawn whatever processes it normally would on production, which would run within that second container. The second container is where we'd manually call run-job when necesary.
I think this is less complicated than Option A or B, above, and also more similar to what we do on production. I expect it'd also make it easier to share source code or config changes across the team.
I can't think of any significant drawbacks to this approach, but it I'm missing something, I'd love to hear it!!!

Thanks so so much once again!!! :)

We have two generic base images:

One with PHP and Apache, and whatever PHP extensions and Apache modules we'd normally install on prod via apt.

Another with PHP, Python and cron, and whatever PHP extensions we'd normally install on prod via apt.

After we talked this through (thanks!) it seems that "generic" is not the right term here. What I mean is, a single image that has what's necessary for the applications it'll run, and has facilities for sending into the image whatever specific config is needed. So, in the first case, a single image could potentially be used in two separate containers for Civi Web and Payments, since the PHP setup requirements are quite similar, and both containers will run Apache and rsyslogd (to get logs from syslog and send them to the logging container).

Thanks again!!

jgleeson added a comment.Nov 28 2020, 12:51 AM

This comment was removed by jgleeson.

Hey @AndyRussG

Thanks, it was good to chat and thanks so much for all the work you've done on this so far!

As discussed, I do see some drawbacks using two Docker images that are not application-specific as the base images of all of the application containers, as you describe in 'Option C'. I'd originally replied across two posts but I've condensed the response into this single post so it's easier to follow.

The main issue I see with that approach is that we're effectively Dockerizing a web server environment and not the applications themselves which skips a core part of the Docker process. Due to this, we'll lose some of the benefits of using Docker and introduce a bunch of other issues that I'll outline below.

In T268685#6653153, @AndyRussG wrote:

Thanks so much for digging in and for the explanations!!!! What about the following as Option C:

We have two generic base images:

One with PHP and Apache, and whatever PHP extensions and Apache modules we'd normally install on prod via apt.

Another with PHP, Python and cron, and whatever PHP extensions we'd normally install on prod via apt.

1) Containers will include stuff unrelated to the apps they run which will cause undesired side-effects

The first issue I see with this approach is that the base Apache image would eventually end up having the webserver config including SSL cert and vhost config for three single apps (Payments, CiviCRM and SmashPig) baked into the single base Apache image, as you've started here and here for our first app, payments.

This would mean that when you run a CiviCRM container from the base apache image, that container would also have within it a PaymentsWiki and SmashPig setup that is never used which goes against Docker best practice as outlined here Don’t install unnecessary packages and here Decouple applications. The same for when you run a PaymentsWiki container, it would internally have CiviCRM setup files that are never used.

A side-effect of this will likely be collisions between the three web apps present within the single container instances e.g. port conflicts due to all three web apps trying to serve over 80/443 which we'd have to mitigate at the image level if we pursued this approach. I imagine Apache will also error due configured apps src code missing unless we mount the src code for all three apps inside each individual container which would again violate Docker best practice.

2) We wouldn't be able to manage specific packages for specific applications in Docker.

Common Docker tasks like running container instances with different extensions or versions of PHP for the different applications wouldn't be possible from a shared base image as they'd be tied to the packages on our single webserver base image in development across all apps that run from the webserver base image. We'd be forced to work around limitations like this either by manually adding additional external shell script commands or by connecting into the containers and modifying packages outside of Docker, which would then be lost when the container is rebuilt. This might trip us up in the event that version requirements diverge between our apps in the future.

Instead, if we separate these images out into their own application-specific Docker images then we would remove the redundancy and collisions allowing the application containers to individually expose ports 80 & 443 which we'd then map in our docker-compose file to service-specific ports. We could then define custom versions of binaries, extensions and PHP/Python on a per-application basis allow us to test new packages more easily. This would arguably result in a set of application images that are easier to manage and test.

3) WMF are already building application-specific Docker images and we probably should too.

It's worth noting as we discussed that the images stored in the WMF dev-images repo are intended to run MediaWiki and are not generic Docker images as they first appeared which I think may have created some confusion about how to use Docker in WMF projects. As you can see here although the name suggests a generic stretch-php72-fpm-apache2 image, it instead contains a setup that is for building an image to run MediaWiki inside that environment spec which indicates that Docker users across the foundation are already building app-specific images and we probably should too. More examples of specific images in the WMF repo are here, here and here

4) We wouldn't be using standalone images to build our application containers.

I think Docker convention is to use a separate container for each process that runs continuously (though it might spawn other processes), which is not the same as each application (especially in our case, since the applications themselves would reside on the host). So, as per this convention, apache would be the main process for the first container, and crond would be the main process for the second container. crond would spawn whatever processes it normally would on production, which would run within that second container. The second container is where we'd manually call run-job when necessary.

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings - taken from Docker: what is a container?

By this definition, we should be thinking of containers as being built from discrete images.

5) Containers wouldn't be ephemeral

My understanding is that containers are intended to be ephemeral with minimal setup required to rebuild them, and I don't feel this is consistent with having important parts of the setup for all containers within a separate shell script that has to be run manually each time we rebuild one container within the stack. Sure there are good reasons for having setup scripts for things that a shell script is more suited to handle like prompts and collecting user input to set up accounts and URLs. I get that the lines are blurred here, but I don't think these ancillary scripts should be doing things that Docker can do already, e.g. calling Docker commands to connect back to containers like here to run install scripts. So far this is only setting up payments, MySQL and Redis but as we add more applications like Drupal, CiviCRM. process-control and fundraising-tools to this stack, our scripts or collection of setup scripts will only grow as we have to capture each app's specific setup that isn't already captured in the base generic images.

6) All non-generic setup that isn't in the base images such as build instructions needs to exist outside of Docker inside shell scripts.

I think this is less complicated than Option A or B, above, and also more similar to what we do on production. I expect it'd also make it easier to share source code or config changes across the team.

I actually feel like it's more work to move and manage some of the application-specific setup required to tailor our applications to run from these base images, outside into other files when it can mostly be managed all in one place within Docker in application-specific images and shared as one Dockerfile added with the application src code.

7) It's slower to rebuild containers because our non-generic setup within the shell scripts isn't cached.

The 'Option C' approach comes with a performance cost because any setup commands that do not live in Docker will not benefit from Docker's layer caching which means that any subsequent rebuilds of containers will also need an accompanying manual shell script run to finish off any setup stored outside of Docker. We don't need to do this if we keep the build instructions inside the Docker images for each application and this would allow us to quickly destroy and rebuild inline with Docker best practices Create ephemeral containers

8) We miss out on the opportunity to make it easier for others to work with our applications and contribute.

A big downside to using generic base images as described in 'Option C' to run our applications is that we miss out on the opportunity for our individual applications to benefit from their own Docker images making it much easier for others to use, test and contribute to the projects. This is one of the main advantages of using Docker as it allows anyone anywhere to download the code, run the application in a container and immediately work with it without needing to worry about how to set it up.

Misc
I realised after our call why the WMF projects we've been looking at so far don't have their own Dockerfiles, but in fact, they did, and they have since been converted into docker-pkg format and pushed up to the docker-registry.wikimedia.org registry via either docker push or using docker-pkg. This is something we can also do also do for any Dockerfiles we create going forward as once they are in the registry we don't need the build files locally and they can just be referenced as the MediaWiki docker images are today.

I hope the above makes sense and helps give some perspective on why I think we can take a slightly different approach to dockerizing our applications and reach the same end goal of making it easy and fast for all to work with and maintain going forward. Also for context, this comment is a more comprehensive higher-level summary of the things I've been trying to put across when chatting with you and @Ejegg in our previous talks about how we intend to use Docker in general for development across the stack.

Also... I know this is a lot! and sorry if it seems disproportionately critical, I know you've spent a lot of time working on this for all of us which I really hope you know I greatly appreciate! I was reluctant to go to such detail as I know this might not be a pleasant read :( but with that being said, these are important decisions which will affect how the whole team develop for the foreseeable future using Docker across the stack so I wanted to make my reservations clear so you could consider them and we could discuss and work out next steps. For me, it would be great to get the most out of Docker whilst also using it in a way which is in accordance with the recommended practices and also with the huge added benefit of making it easier for others inside and outside WMF to use and contribute to the projects!!! :)

Thanks again for the feedback and work you have put in on this getting us up and running!!! I look forward to chatting with you more on this next week.

jgleeson added a subscriber: Ejegg.Nov 29 2020, 8:49 PM

Hi! Thanks so so much for digging in here! The time and energy spent on this is hugely appreciated. So is the thoughtfulness with which you've approached providing this feedback!!!! :)

I'll post a more thorough response as soon as I can, but I thought perhaps it'd be best not to leave this without a reply for long! I need to apologize because I think I haven't been clear in several explanations, both of how things work and what I'm proposing, since several points you bring up seem at least in part based on misunderstandings. In a way-too-quick summary, (1) seems based on a misunderstanding of what I'm suggesting, sorry if I wasn't clear, I'll explain more ASAP, (2) same as point (1), (3) I think that so far we have closely followed the dev image pattern currently used by the WMF, and the approach I'm suggesting going forward does the same in nearly all regards, (4) standalone images aren't a priority now, and we are following Docker best practices for dev images, and will still be separating concerns, just along slightly different lines than what's typical for production images, (5) the containers already are 100% ephemeral, you don't re-run the setup script when you recreate them, and this will continue to be the case, and mea culpa for not being clear about this, I've just sent a patch to improve the doc (6) I guess I just disagree, I feel the approaches you suggest would be more work to maintain and coordinate across the team, and (in one option you mention) would have a setup less similar to production, but there are details to be worked out here, especially as regards how each option would really work, and what's really possible given the constraint of keeping the source code outside the images, (7) currently containers build in just a few seconds, and that's not expected to change, so I think it's not a big issue, and performance requirements are not those of a production setting, (8) no one else uses the codebases we work on, except for Civi, for which there already are images, and perhaps process-control, for which you've already built a cool standalone image, and many remaining bits of the stack are kinda useless in isolation, and the fundraising-dev repo actually does make it simple to get all the code and start hacking on it, and finally, under (Misc), yep, pushing our images to the WMF registry is indeed the plan. :)

Of course as always I'm very likely to be missing things, and I'm really happy we're reflecting on all this!!

Ehhh also now it's my turn to apologize also if this is seems disproportionately critical or negative! And apologies again for misunderstandings caused by my insufficient explanations!

Finally, I hope to have a working implementation quite soon, which will, I think, help clarify the discussion a lot.

Again, I'm extremely happy we're having these kind of discussions!!!! Thanks so much once again, and also apologies for not providing a complete reply just yet! :)

My input is out of place here, since I would only be tinkering with process-control to make tiny changes, and I don't plan to use the package in production work.

But disclaimer aside, I would expect the repo's local Dockerfile to do the absolute minimum, it only needs to support the use case of developers running the script and inspecting its outputs. I wouldn't even include crond. Maybe the image's default behavior is to copy . into the container and pip install -e. The simple usage would be to run the generation script piped to stdin/stdout. The fancy usage would be to run this image with a mounted volume for providing a directory of input configs, and another mounted volume for capturing the output crontab.

Testing run-job seems annoying, maybe we include a very simple target "job" which can perform some debugging such as killing itself after a given time, creating a zombie subprocess, printing output and loglines etc. to test our pipes...

I wouldn't expect or want the development container to have any CiviCRM etc. machinery, this would just complicate the questions we want to answer. Integration between our applications is best tested on a different level, in CI and on staging servers, not in this utility repo's Dockerfile.

But disclaimer aside, I would expect the repo's local Dockerfile to do the absolute minimum, it only needs to support the use case of developers running the script and inspecting its outputs. I wouldn't even include crond. Maybe the image's default behavior is to copy . into the container and pip install -e. The simple usage would be to run the generation script piped to stdin/stdout. The fancy usage would be to run this image with a mounted volume for providing a directory of input configs, and another mounted volume for capturing the output crontab.

Testing run-job seems annoying, maybe we include a very simple target "job" which can perform some debugging such as killing itself after a given time, creating a zombie subprocess, printing output and loglines etc. to test our pipes...

hey @awight, check out the README on this patch. I think it does a lot of what you outline above. This is the process-control specific Docker image that I refer to in part one of this post

The other stuff I'm describing outside of that post is less specific to process-control and more related to the overall design of the new fr-tech Dockerized local development project that @AndyRussG has started on for us all.

In T268685#6654822, @AndyRussG wrote:

Hi! Thanks so so much for digging in here! The time and energy spent on this is hugely appreciated. So is the thoughtfulness with which you've approached providing this feedback!!!! :)

I'll post a more thorough response as soon as I can, but I thought perhaps it'd be best not to leave this without a reply for long! I need to apologize because I think I haven't been clear in several explanations, both of how things work and what I'm proposing, since several points you bring up seem at least in part based on misunderstandings. In a way-too-quick summary, (1) seems based on a misunderstanding of what I'm suggesting, sorry if I wasn't clear, I'll explain more ASAP, (2) same as point (1), (3) I think that so far we have closely followed the dev image pattern currently used by the WMF, and the approach I'm suggesting going forward does the same in nearly all regards, (4) standalone images aren't a priority now, and we are following Docker best practices for dev images, and will still be separating concerns, just along slightly different lines than what's typical for production images, (5) the containers already are 100% ephemeral, you don't re-run the setup script when you recreate them, and this will continue to be the case, and mea culpa for not being clear about this, I've just sent a patch to improve the doc (6) I guess I just disagree, I feel the approaches you suggest would be more work to maintain and coordinate across the team, and (in one option you mention) would have a setup less similar to production, but there are details to be worked out here, especially as regards how each option would really work, and what's really possible given the constraint of keeping the source code outside the images, (7) currently containers build in just a few seconds, and that's not expected to change, so I think it's not a big issue, and performance requirements are not those of a production setting, (8) no one else uses the codebases we work on, except for Civi, for which there already are images, and perhaps process-control, for which you've already built a cool standalone image, and many remaining bits of the stack are kinda useless in isolation, and the fundraising-dev repo actually does make it simple to get all the code and start hacking on it, and finally, under (Misc), yep, pushing our images to the WMF registry is indeed the plan. :)

Of course as always I'm very likely to be missing things, and I'm really happy we're reflecting on all this!!

Ehhh also now it's my turn to apologize also if this is seems disproportionately critical or negative! And apologies again for misunderstandings caused by my insufficient explanations!

Finally, I hope to have a working implementation quite soon, which will, I think, help clarify the discussion a lot.

Again, I'm extremely happy we're having these kind of discussions!!!! Thanks so much once again, and also apologies for not providing a complete reply just yet! :)

Hey, @AndyRussG thanks for the reply.

I feel like the drawbacks I've listed are valid and this response maybeee kinda brushes some of them off a bit? I'm not sure who decided that standalone application-specific images are not the priority, and with what understanding they made this decision, as it's a misunderstanding of how Docker is intended to be used to exclude them. This is a mistake and doesn't follow the model outlined in the official Docker documentation as described here (I've retracted this sentence as it feels confrontational when reading it back now, which is not my intention and was not helpful, sorry about that @AndyRussG )

I feel like there's been some confusion around these points being related to a production environment and not local development. Let me clear that up, this is for local development and the drawbacks outlined above will affect us when developing locally with Docker.

Also, to be clear, my reply on this thread isn't specific to just how we use process-control (it came up here due to your suggestion for two base images across all apps), it instead applies to all the applications we containerize across the new fundraising-dev local Docker project.

I'll pause on this task for now until we can discuss it further with the rest of the team and get @Ejegg's thoughts.

Thanks again!!!

Update 22/01/21: I've retracted the confrontational-sounding sentence as it's unhelpful and inappropriate. Sorry about that @AndyRussG

I feel like the drawbacks I've listed are valid and this response maybeee kinda brushes some of them off a bit?

Ahhhh that's not the intention, I assure you...!!! More specifics are on the way later today, as promised, and I realize it's actually not possible to discuss many issues involved without those specifics... but I did want to send a initial response relatively quickly, first of all, to make it clear that all that you wrote was indeed seen, and second, to emphasize there are misunderstandings at work here that also need to be cleared up, basically an essential step in discussing some of your points... Apologies if that was not the right approach, and especially apologies if I gave the impression of brushing off your concerns...! Again, more details later today...

I'm not sure who decided that standalone application-specific images are not the priority, and with what understanding they made this decision

The proposed requirements for a Docker-based FR-Tech dev setup are in this comment on the epic task: T262971#6582686. Indeed, they've been there for over a month. They were also at the top of the Etherpad that I was passing around, and actively, regularly soliciting feedback on, for I think a few weeks before that.

That said, it's certainly possible to revisit requirements after a project has started.

This is one point from your comments where there doesn't seem to be a misunderstanding, but rather a disagreement about goals and priorities. Definitely makes sense to discuss with the team if there's a disagreement.

it's a misunderstanding of how Docker is intended to be used

So this is in part a disagreement about how we use tech in general. I don't think we need to necessarily use anything the way it was intended. True, often it's important to harmonize practices, and things often work better if you use them as intended, but that doesn't mean using them as intended should be a goal in and of itself, and if doing things differently than intended suits your needs better, then that's the way to go, I think. That said, I also disagree about whether we're using Docker as intended. I think we are using it as it's intended to be used for the use case of development environments. And we are also using it in the same way the WMF uses it for dev evnironments (the main difference being that we're scripting things up rather than copying from a document and pasting into files or the console).

doesn't follow the model outlined in the official Docker documentation as described here

I don't see how it doesn't. Certainly I could be missing something. But if you want to discuss further about how we're not following the document you linked, I guess I'd need more details, to understand exactly what you're saying here? :)

Also, to be clear, my reply on this thread isn't specific to just how we use process-control (it came up here due to your suggestion for two base images across all apps), it instead applies to all the applications we containerize across the new fundraising-dev local Docker project.

Yes, agreed 100%. This is also a big reason why it'd be great to get some feedback from the rest of the team!

So, again, more details and clarifications to come, and hopefully soon I'll have a working proposed implementation to show, and also hopefully soon we can get input from others! :)

Thanks again!!!

This is one point from your comments where there doesn't seem to be a misunderstanding, but rather a disagreement about goals and priorities. Definitely makes sense to discuss with the team if there's a disagreement.

it's a misunderstanding of how Docker is intended to be used

So this is in part a disagreement about how we use tech in general. I don't think we need to necessarily use anything the way it was intended. True, often it's important to harmonize practices, and things often work better if you use them as intended, but that doesn't mean using them as intended should be a goal in and of itself, and if doing things differently than intended suits your needs better, then that's the way to go, I think. That said, I also disagree about whether we're using Docker as intended. I think we are using it as it's intended to be used for the use case of development environments. And we are also using it in the same way the WMF uses it for dev evnironments (the main difference being that we're scripting things up rather than copying from a document and pasting into files or the console).

I appreciate this philosophy but I would say this only applies if it is obvious to all why we would not use the tool as intended and it is agreed upon without reservation by all the people who will likely have to use this non-standard implementation on a daily basis to do their jobs, which is an important distinction no?

I feel like the only objection at the moment to not building application-specific images (by which we'd avoid all the issues I've outlined and still be able to do everything in your original plan) is that it's been deemed not a priority, so I must challenge this, sorry. I also don't want to rush through working to the line of least resistance using our current approach to build out the entire stack to then realise in the future we maybe should have followed the recommended approach to dockerizing applications as we understand the tool better. A quick google search of the term "dockerizing applications" shows numerous articles describing what I am saying.

doesn't follow the model outlined in the official Docker documentation as described here

I don't see how it doesn't. Certainly I could be missing something. But if you want to discuss further about how we're not following the document you linked, I guess I'd need more details, to understand exactly what you're saying here? :)

IMAGES
An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

You might create your own images or you might only use those created by others and published in a registry. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it. Each instruction in a Dockerfile creates a layer in the image. When you change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so lightweight, small, and fast, when compared to other virtualization technologies.
Source: https://docs.docker.com/get-started/overview

Writing a Dockerfile is the first step to containerizing an application. You can think of these Dockerfile commands as a step-by-step recipe on how to build up your image. The Dockerfile in the bulletin board app looks like this:

and

You can see that these are much the same steps you might have taken to set up and install your app on your host. However, capturing these as a Dockerfile allows you to do the same thing inside a portable, isolated Docker image.

Source: https://docs.docker.com/get-started

Isolated being decisive adjective here.

I hope that helps clear up why I feel we are not following the recommended usage, specifically the text in bold.

Again, thanks for your patience hearing me out on this! and all the work you've done so far. I feel like just tweaking our approach slightly would avoid most of the potential drawbacks I've outlined with little to no extra work needed, however it doesn't look like we're gonna resolve this discussion via this phab ticket so I'll wait for the info and demo you mentioned and then we can discuss it with other folks in fr-tech and consider the options. Thanks for going to the effort to do that btw!!!

Thanks much!!!

I feel like the only objection at the moment to not building application-specific images (by which we'd avoid all the issues I've outlined and still be able to do everything in your original plan) is that it's been deemed not a priority, so I must challenge this, sorry.

Actually, no, it's not even the main objection. The real objection is it's not feasible to do in a straightforward manner. This is because: the source code needs to go outside the image. This is standard Docker dev environment practice, and it's what all the other WMF dev images do. Much of the setup, including stuff installed by composer, and in the case of Civi Buildkit, even the Apache settings, is generated based on what the source code says. So: all of that generated setup goes outside the image, too. (That doesn't mean that absolutely no setup should ever go in the image. But in our case a lot of it doesn't.)

However, capturing these as a Dockerfile allows you to do the same thing inside a portable, isolated Docker image.

That may be more or less applicable for production images, but I think it's not quite the same for dev images.

Isolated being decisive adjective here.

Isolated can mean many different things. It doesn't have to mean that the setup is all baked into the image. What you're quoting is a tutorial geared at learning to create production images. That's not our use case.

In this case I would suggest focusing more on this general principle: each container should have only one concern (also from Docker best practices docs). In our case, defining images' concerns means defining as clearly and simply as possible their contract with the rest of the setup, and making sure the code that creates each image doesn't have to know about the details of anything outside it, beyond what is specified in the contract. So, separating concerns, in a way that best meets our needs.

I also don't want to rush through working to the line of least resistance

Agreed, and that's why I'm super glad we're talking through all of this!!! It's incredibly important that we do so.

I'll wait for the info and demo you mentioned and then we can discuss it with other folks in fr-tech and consider the options

Ok thanks so much for being patient!!!

Thanks for going to the effort

Thank u for all your effort and insight on this!!!!!

In T268685#6656784, @AndyRussG wrote:

I feel like the only objection at the moment to not building application-specific images (by which we'd avoid all the issues I've outlined and still be able to do everything in your original plan) is that it's been deemed not a priority, so I must challenge this, sorry.

Actually, no, it's not even the main objection. The real objection is it's not feasible to do in a straightforward manner. This is because: the source code needs to go outside the image. This is standard Docker dev environment practice, and it's what all the other WMF dev images do. Much of the setup, including stuff installed by composer, and in the case of Civi Buildkit, even the Apache settings, is generated based on what the source code says. So: all of that generated setup goes outside the image, too. (That doesn't mean that absolutely no setup should ever go in the image. But in our case a lot of it doesn't.)

ahhhhhhhhhhhhhhhh ok so now I see why we're not on the same page!!

lemme throw together an example of what I'm actually suggesting later once I get some time shouldn't take long (I probably should have done that to begin with right?) and then I'll respond in full with an example.

Thanks much!

In T268685#6656784, @AndyRussG wrote:

I feel like the only objection at the moment to not building application-specific images (by which we'd avoid all the issues I've outlined and still be able to do everything in your original plan) is that it's been deemed not a priority, so I must challenge this, sorry.

Actually, no, it's not even the main objection. The real objection is it's not feasible to do in a straightforward manner. This is because: the source code needs to go outside the image. This is standard Docker dev environment practice, and it's what all the other WMF dev images do. Much of the setup, including stuff installed by composer, and in the case of Civi Buildkit, even the Apache settings, is generated based on what the source code says. So: all of that generated setup goes outside the image, too. (That doesn't mean that absolutely no setup should ever go in the image. But in our case a lot of it doesn't.)

Arghhhhh, it looks like I may have given you the impression all along that I was advocating only deploying the source code to the images prior to running containers, this is not the case!! Sorry if I haven't been clear on this point. To be clear, all source code for all applications would be deliberately bind-mounted to the containers on run at the docker-compose level, allowing edits in the realtime on all applications.

I agree that this approach is a standard docker development practice, that's how I understand it also. However, there is something else mentioned on that page you linked which I want to highlight as I feel it might help clear up a potential false dichotomy of production vs dev images, in relation to how source files need to be stored.

Source: differences-in-development-and-production-environments

Notice the wording is mounting it into the same location as you mounted a bind mount during development, meaning the actual Docker image does not need to change between dev and production to use source files. The only change is how the container accesses the files, not the image. This makes sense if you think it through, as it means developers and operations can use the same Docker image (or image hierarchy if we wanna get fancy) running from local right through to production to build the application. This is a big part of the philosophy behind DevOps.

Also, the build steps (including any install scripts) for apps can also all live inside the Docker image using the multi-stage approach. You can chain the build steps, e.g. running the payments-wiki install scripts and then passing the configured application on to the next step in the build, encapsulating the entire process in a single Docker image removing the need for any shell scripting while also giving us all the benefits of using Docker e.g. consistent API, caching, single responsibility. There's a great example of this here

In T268685#6656784, @AndyRussG wrote:

Isolated can mean many different things. It doesn't have to mean that the setup is all baked into the image. What you're quoting is a tutorial geared at learning to create production images. That's not our use case.

In this case I would suggest focusing more on this general principle: each container should have only one concern (also from Docker best practices docs). In our case, defining images' concerns means defining as clearly and simply as possible their contract with the rest of the setup, and making sure the code that creates each image doesn't have to know about the details of anything outside it, beyond what is specified in the contract. So, separating concerns, in a way that best meets our needs.

As promised, here is an docker example stack that I've pushed up to demonstrate how we could use application-specific Docker images along with mounting each application's respective source code within the containers using bind-mounts.

This skeleton app pulls in five different applications (in theory) and builds them all from application-specific images, which are then integrated by Docker-compose. Each application is isolated from one another and has its own package dependencies which are managed within its own Docker image file. Admittedly this is an overly simplistic collection of single-page apps with one real app, process-control, but I've tried to split them out as we would in real-world to show how we could manage all the applications we plan to add to the fundraising-dev docker project

jgleeson mentioned this in T262975: Get civi working on docker.Dec 2 2020, 2:47 PM

Hi! Here are some replies, and some of the clarifications promised...

The only change is how the container accesses the files, not the image. This makes sense if you think it through, as it means developers and operations can use the same Docker image (or image hierarchy if we wanna get fancy) running from local right through to production to build the application. This is a big part of the philosophy behind DevOps.

I do understand how this might be nice to keep in mind... at the same time, I feel production images are out of scope for this project. I feel like it's possible but not super likely we'd ever use the same images in both environments because the requirements are quite different, and puppet setup for our production environment is working and is well maintained. I think a higher priority for this project is ease of use and setup for the whole stack, for fr-tech developers. I guess I don't know for sure that the two considerations are incompatible, but in the case we'd find they are, I'd go with what works best for the dev needs.

e.g. running the payments-wiki install scripts and then passing the configured application on to the next step in the build, encapsulating the entire process in a single Docker image removing the need for any shell scripting

So, wouldn't that mean at least having a version of the source code baked into the image so that you can run those install scripts when creating the image and baking in the results? Apologies if I'm missing something!

Also, what is the drawback of shell scripting? Commands inside Dockerfiles are also basically shell commands?

Also, isn't it simpler for a dev environment to have a single shell script on the host rather than many shell commands in Dockerfiles whose results get baked into images? Again, apologies if I'm missing something.

Containers will include stuff unrelated to the apps they run which will cause undesired side-effects

The first issue I see with this approach is that the base Apache image would eventually end up having the webserver config including SSL cert and vhost config for three single apps (Payments, CiviCRM and SmashPig) baked into the single base Apache image, as you've started here and here for our first app, payments.

This would mean that when you run a CiviCRM container from the base apache image, that container would also have within it a PaymentsWiki and SmashPig setup that is never used which goes against Docker best practice as outlined here Don’t install unnecessary packages and here Decouple applications. The same for when you run a PaymentsWiki container, it would internally have CiviCRM setup files that are never used.

A side-effect of this will likely be collisions between the three web apps present within the single container instances e.g. port conflicts due to all three web apps trying to serve over 80/443 which we'd have to mitigate at the image level if we pursued this approach. I imagine Apache will also error due configured apps src code missing unless we mount the src code for all three apps inside each individual container which would again violate Docker best practice.

As shown in the patch set 14 here, we can also place Apache setup on the host via a bind mount. So, no contradictory Apache configs in any containers, and the image can have a concern defined by a clear contract regarding to how it interacts with stuff.

Also, the existing ssl cert works for subdomains of localhost, so no issues there, either!

Ahhh more to come!!!

Note: edited my last comment to fix some typos and unclear bits. :)

From this comment:

potential false dichotomy of production vs dev images

and

You can chain the build steps, e.g. running the payments-wiki install scripts and then passing the configured application on to the next step in the build, encapsulating the entire process in a single Docker image

So, I'm trying to understand the specifics of what you're suggesting. As regards the differences between development and production images, I think there's more to it than just what type of mount is used for the source code? As in, production images need some different setup (such as no XDebug, other streamlined config) and need to be lightweight and more "ready to go" so you can quickly switch to a new image in a production environment, no? And also, I think you don't want to run your setup scripts to generate config directly in a production environment, usually, but rather have the config ready-made and managed in the image or elsewhere, no? And also, you won't be regularly regenerating database contents, I think, as devs might?

Also, thinking about the multi-stage build process you mentioned... say, in the case of running composer, I suppose you could have a multi-stage build where there's a dev image that doesn't include the libraries installed by composer, and then for the production build you start with the dev image, run composer, then copy the stuff that composer generated into the production image? So, if that's the sort of thing you mean, I guess the main comments on that I'd have are:

I don't feel we should make too much effort taking into account future possible production build processes now.
With the Payments image as it's currently proposed, wouldn't that still be possible in any case? That is, if we ever decided to use the current Payments dev image as a starting point for a production image, couldn't you add a build stage based on the dev image that downloads the source code, runs composer and/or any other appropriate setup code that creates artifacts we'd want in the prod image, and then bakes that into a new image? In what way would a dev image that would work for that have to be different from the current proposed Payments dev setup?
So the current way fundraising-dev sets up the Payments service is really close to the way core Mediawiki dev images work, with the main difference being, again, that we've scripted things instead of asking devs to copy and paste from the doc. So, for core, as per the instructions for developers, you manually create an .env file, spin up the services, manually run composer update, then manually run install.sh. While there is some Mediawiki-specific config placed in the image (PlatformSettings.php) most is generated outside the image by install.sh (which is just a wrapper for install.php). Most devs using that setup will also end up further tweaking the generated settings and will manually install more parts of the stack, depending on what they're working on. So, the main difference between that and what fundraising-dev currently does are:
- instead of asking people to manually run those steps, we script them up;
- instead of having some settings in the image but most outside, we put all outside; and
- instead of having people manually tweak settings, we have all our unique settings ready-made and stored in git repos, so devs can easily retrieve them and share their tweaks across the team.

I don't think this is a massive difference in approach from the way Docker dev setups work for core. The changes made are just to streamline and standardize things. I guess the latest proposal for the Payments image moves one step further away from the core dev images, in that it gets the Apache config from the host via a bind mount, and includes libraries used by Civi, too. Even so, it's still quite close to core dev images, I feel!

Regarding the example you linked for multi-stage dev and production builds, I may well be missing something, but it seems kinda different since it's for an app in a compiled language, and I think it doesn't show how they'd deal with different dev/prod config files, but again, I also don't see how the current proposed Payments image couldn't potentially be used in a similar way?

Again, replying to part of this comment:

We wouldn't be able to manage specific packages for specific applications in Docker.

Common Docker tasks like running container instances with different extensions or versions of PHP for the different applications wouldn't be possible from a shared base image as they'd be tied to the packages on our single webserver base image in development across all apps that run from the webserver base image. We'd be forced to work around limitations like this either by manually adding additional external shell script commands or by connecting into the containers and modifying packages outside of Docker, which would then be lost when the container is rebuilt. This might trip us up in the event that version requirements diverge between our apps in the future.

Instead, if we separate these images out into their own application-specific Docker images then we would remove the redundancy and collisions allowing the application containers to individually expose ports 80 & 443 which we'd then map in our docker-compose file to service-specific ports. We could then define custom versions of binaries, extensions and PHP/Python on a per-application basis allow us to test new packages more easily. This would arguably result in a set of application images that are easier to manage and test.

So, here's how we could test a different version of PHP or a different library for only Payments or only Civi, in the current proposed setup:

Create a new, differently-named image with the different version of things (so, for example, fundraising-buster-php8-apache2-xdebug).
Upload the new image up to the image registry.
Change the name of the image in docker-compose.yml for whichever service or services we'd want to test with the new stuff, and change it back when we're done.

Am I missing something? I don't see how we'd need any additional scripts, or manually have to run commands inside the image or change things in the ephemeral container storage? Also, I don't see how this changes one way or another as a result of where or when we run application setup processes, or where we put the artifacts from those?

As regards ports, again, in the latest fundraising image patchset (14), the different services' containers would have different, unique Apache configs (stored on the host) even while the containers would be created from the exact same image. And (maybe I'm misunderstanding your point on this? apologies if so) the containers can expose whatever ports they like on the internal docker-compose-managed network. Multiple containers can expose the same or different ports there, so again, I don't see any port-collision issues, in any possible setup. Again, sorry if I'm misunderstanding! :)

hey @AndyRussG.

I've read your replies and although I could reply to each point further explaining and answering your questions, I can't help but feel a bit discouraged that you've not acknowledged my response to what was your main objection to using application-specific images, which was the insistence that the source code goes outside the images, to which I replied and explained that this was in fact exactly what I was proposing. I thought this would help.

I also put together a proof-of-concept sample application to show exactly how this could work.

Alsoooo, as you're actively building out the fairly complicated CiviCRM docker setup with @Eileenmcnaughton and @Ejegg using the current shared-images approach, around this conversation, it feels like there is little point in continuing this discussion about our fundamental approach when, as they say, the horse has already bolted. If you do see benefits in the changes I'm proposing, later on, we'll probably be too far down the road to change course without requiring meaningful refactoring so I have to be realistic.

If you are committed to the current approach and don't see any of the issues I've raised as warranting changing course or indeed as issues at all, then let's continue with the status-quo and I'll accept that. I know we can get everything working in the current approach, I just felt like we would get additional benefits using the application-specific imaging approach but after thinking it through I've realized it's not the end of the world if we don't adopt them and we can still have a local dev environment that is way better than using vagrant.

If I have misunderstood the situation and you in fact do want to continue talking this through then I would suggest we take the somewhat drastic step of pausing our Docker imaging for now so we can have this discussion on the fundamental approach to talk through the pros & cons of each, maybe throwing together a comparable POC demo application using the current shared-images approach so then we could then compare and discuss it further over video call, and also bring in others, which might work better.

Thanks again for all your work this.

Update (22/01/21): I've removed the suggestion that we carry on with the current shared images approach as I'm now convinced after further learnings on the subject this isn't the way to proceed.

@AndyRussG @jgleeson what strikes me reading this is that we are all on a learning curve with docker. A lot of work has gone into learning and figuring stuff out but it's the sort of work that would not have to be re-done if we changes approach later on. My suspicion is that we could make fairly fundamental changes later, once everyone is comfortable with using docker, without it being anything like the work of a re-write. As you can probably tell I'm leaning towards a miminum-viable-product approach and allowing for revisiting.

Thanks, @Eileenmcnaughton. I do think it's unlikely we'd revise the approach to building images later on using the current approach after investing the time to image our stack and publish those images to the WMF registry but it's not unthinkable.

The catalyst for me raising the points outlined earlier was completing this docker course the weekend before last and then starting to apply those learnings to the work we're doing here and specifically this ticket. What we're doing here is a slightly different approach to what I understood was the typical starting point to Dockerizing applications but I do realise I might be coming off overly dogmatic in regards to a tool I'm also still learning, as you rightly point out. I'll help out where I can and follow the current approach as mentioned in the previous ticket.

Thanks for the feedback!

• DStrine removed a project: Fundraising Sprint Xtreme Lolcats.Dec 4 2020, 6:10 PM

• DStrine moved this task from Current Sprint to Sprint +1 on the Fundraising-Backlog board.Dec 7 2020, 5:55 PM

• DStrine added a project: FR-Docker.Jan 3 2021, 7:59 PM

• DStrine moved this task from Sprint +1 to Sprint +3 on the Fundraising-Backlog board.Jan 3 2021, 8:09 PM

Change 643767 merged by jenkins-bot:
[wikimedia/fundraising/process-control@master] Dockerize process-control

https://gerrit.wikimedia.org/r/643767

Maintenance_bot removed a project: Patch-For-Review.Jan 18 2021, 2:10 PM

@AndyRussG @Eileenmcnaughton I'd like to reopen this discussion before our next Docker sprint.

I've been working through another, more comprehensive, Docker course and I'm now convinced we should be Dockerizing each of our discrete fr-tech applications & services individually as standard practice and then using those individual configs to form our combined stack with Docker-Compose as a separate fr-tech-specific project to provide a local dev environment for engineers on the team.

That means a separate Dockerfile for CiviCRM, Civiproxy, Paymentswiki, fundraising-tools, process-control, Smashpig listeners. We'd then have a standalone fr-tech-specific docker-compose config to stand these apps up together including a shared mount for application src code via git submodules and shared data needed across different applications (e.g. config). This will allow projects to be worked on as a stack and also individually when appropriate.

Also, for things like cron and xdebug which are needed across multiple projects, we'd build these out in separate Dockefiles which then serve as the base image (using FROM) of all app-specific images which need these services. This way we'd only have to do the work once, and it would be managed separately.

The good news is that we're not that far away from what I describe above as we already have mostly completed individual Docker builds for the following projects:

CiviCRM
CiviProxy
Payments wiki (we might need to split out some of the xDebug work here to its own Dockerfile)
process control

I've retracted my suggestion we carry on using the alternative approach above.

Here are some examples across the web of other Dockers users using individual Dockerfile images across multi-app projects and discussion on the subject:
https://nickjanetakis.com/blog/docker-tip-10-project-structure-with-multiple-dockerfiles-and-docker-compose
https://stackoverflow.com/questions/27409761/docker-multiple-dockerfiles-in-project/48243640#48243640
https://stackoverflow.com/questions/48841261/how-to-run-multiple-applications-with-single-mongodb-using-docker/48856407#48856407
https://stackoverflow.com/questions/61696286/project-including-multiple-dockerfiles-and-apps-sharing-some-files-how-to-const/61696554#61696554
https://stackoverflow.com/questions/59040916/using-docker-for-multiple-php-applications/59041494#59041494

I've also attached the project files from the Docker course I'm working through which includes a multi-app project stack like ours using individual Dockerfiles per app, this is covered in

docker-course-multi-app.zip51 KBDownload

And finally, here's the link to the example dockerized fundraising-tech stack proof-of-concept I put together in December to help explain what it was that I was proposing and how it differs to what we currently have.

• DStrine moved this task from Sprint +3 to Sprint +1 on the Fundraising-Backlog board.Feb 1 2021, 4:39 PM

• DStrine moved this task from Sprint +1 to Next on the Fundraising-Backlog board.

• DStrine moved this task from Next to Sprint +1 on the Fundraising-Backlog board.Feb 1 2021, 4:55 PM

jgleeson updated the task description. (Show Details)Feb 9 2021, 1:44 PM

• DStrine moved this task from Sprint +1 to Q3 2021-2022 on the Fundraising-Backlog board.Feb 12 2021, 6:11 PM

XenoRyet moved this task from Q3 2021-2022 to Unscheduled on the Fundraising-Backlog board.Mar 13 2023, 5:48 PM

@jgleeson: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

	F33930936: image.png
	Dec 2 2020, 3:30 AM

Docker dev setup: Set up process controlOpen, LowPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Docker dev setup: Set up process control
Open, LowPublic
Actions

Related Objects
Search...