Page MenuHomePhabricator

Unify production and CI docker image build process
Closed, ResolvedPublic

Description

We should unify the build processes and use the same structure for both sets of images. @Joe's production images script has some neat features like jinja2 templating, and dependency tracking.

Tyler wrote step-by-step instructions:

https://www.mediawiki.org/wiki/Continuous_integration/Docker#Images_using_docker-pkg

Event Timeline

Some requirements of this build process:

Some requirements of this build process:

The build script we use for production has all of these features:

  • All apt actions are abstracted to macros like {{ "package1 package2 package3" | apt_install }}
  • Dependent images are declared with a simple structure at the moment, by nesting their respective directories. This is ok for production-images but it could be a good idea to add a debian-control file to expliticly declare dependencies in case we're dependent on more than one root image (for e.g. multistage builds).
  • We have an easy way to refer to the most recent version of another image (according to its changelog) in order to always refer to the correct image.
  • There is no need for cache busters as we ignore cache at image build time. That is actually the only way around the broken cache model docker employs.
  • There is no need for cache busters as we ignore cache at image build time. That is actually the only way around the broken cache model docker employs.

I was thinking more along the lines of "we have this image, is there new commits to the git repo that was built/included in it?" I was thinking of something similar to the proposed debmonitor but for included git repos.

Also for actual builds that will be used I agree with disabling caching, but for local testing we still need the option to use caching otherwise builds could be pretty slow.

  • There is no need for cache busters as we ignore cache at image build time. That is actually the only way around the broken cache model docker employs.

I was thinking more along the lines of "we have this image, is there new commits to the git repo that was built/included in it?" I was thinking of something similar to the proposed debmonitor but for included git repos.

Also for actual builds that will be used I agree with disabling caching, but for local testing we still need the option to use caching otherwise builds could be pretty slow.

I would argue that locally if you change the code you want to deploy (which happens in a ADD/COPY stanza anyways) cache will be busted automagically; for any other type of change, a cache-free build will be needed. Please consider that caching disabled does not mean you need to re-pull all images, just that the current one will be rebuilt from its parent image.

Status update: I extracted the build script from operations/docker-images/production-images and it is able to build the docker containers in that directory. A first public commit will be ready once I'm done writing tests/documentation.

Once I've done this, I'll look at integrating the features the CI build script has and I think are useful generally, and finally convert the CI repository dockerfiles to the format accepted by this software.

I think we can consider this a pre-alpha version, so we can change basically whatever we want. I'll post here which features of the CI build script I consider dropping so I can get some feedback.

With a quick skim at the CI repo, the following things done there are not supported by the current build system:

  • Auto-figuring out dependencies. This will be implemented, albeit probably in a different way than it is done right now in the build script (it parses the dockerfile "manually"), instead of relying on the directory structure, which was a quick and dirty solution.
  • Tagging images with date instead of a semver versioning. I am dibated about this one: I do think having a changelog is *a good thing*, and we should explicitly track the history of changes in a docker image (even if it is just a new version of some checked-out repository). I can imagine there are cases where this makes sense, though, like an automated build process to create nightlys, but that can be scripted around for sure. @thcipriani what are your thoughts?
  • Running tests within the image. This is surely an interesting feature, but I'd like to make it general enough to make sense outside of CI.
  • Builds are ok to take a long time, and docker's cache model for layers within a single dockerfile is utterly broken. So I think we will default to not using cache when building images and be done with the cache boosting logic and all of that.
  • There is a task for updating jjb files that I'm not sure about, but I would probably drop. That can be done outside of the build process, I think?
  • There is a prebuild.sh that gets run before the build process, which I'm not convinced is a good idea at all. In fact, we should favour the Dockerfile.build pattern, or multistage builds. I'll investigate more but I think I'm going to drop this feature.
  • Tagging images with date instead of a semver versioning. I am dibated about this one: I do think having a changelog is *a good thing*, and we should explicitly track the history of changes in a docker image (even if it is just a new version of some checked-out repository). I can imagine there are cases where this makes sense, though, like an automated build process to create nightlys, but that can be scripted around for sure. @thcipriani what are your thoughts?

I've gone back and forth on tagging for docker images. Initially I started with semantic versioning, but at the time everything was very much <= 0.1.0 and the version changes were meaningless. Also, the plan was/is to automate container rebuilds, so when I last talked about this with @Addshore date versioning looked like the sane thing.

I still like an incremental versioning scheme (like date versioning) for tags in the CI repo as every image build ends up being a new release regardless of changes to the Dockerfile (we see this with nodepool nightlies getting new versions of apt-packages, causing CI breakage).

One hesitation I have is that, I know @Addshore and @Legoktm have been working to ensure these images are usable by developers (as well as by CI). Use outside of CI might make semantic versioning important. Semantic versioning and a revision number would be a good compromise between ability to automate and a version number that means something. @Addshore @Legoktm — what do you think of that proposal?

The date itself has not proved useful for me other than as a comparator. The date could be added as a label if needs be. (aside: some of the CI images contain a build-date label that is nonsense, .e.g., docker inspect --format '{{index .Config.Labels "build-date"}}' wmfreleng/tox:v2017.09.27.08.25 == 2016-11-04T11:03:03Z)

Change 384081 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/docker-pkg@master] [WiP] Port docker builder

https://gerrit.wikimedia.org/r/384081

@thcipriani having run with dates for the past weeks I really don't like them.
It would make sense to be able to link the image with the version of the docker file that it was built with, gerrit change number actually seems like a nice idea to me, they are sequential, and you can immediately go to the version of the docker file that the image came from.
This would probably also work with automated builds?

@hashar also had an idea of having a "production" tag that we could manually set that we would have in the jjb configs, so pushing a new version to this tag would mean jenkins jobs get updated without us also having to copy the tag into jjb.

I have a proposal: what about controlling semantic versioning via the changelog but allowing people to specify a --nightly CLI switch to inject the date in the version number?

So if you add the --nightly switch:

  • build version becomes e.g. 0.3.1~20171017
  • the following tags are pushed to the registry: 0.3.1~20171017 and nightly

if you don't

  • build version is 0.3.1
  • Tags pushed are: 0.3.1 and latest

I think this is the best compromise.

Automated builds that need to be uploaded to a registry should follow this rule of thumb.

1 more thing to throw into the mix.

Right now we have a mediawiki-phan image, and I want to be able to create multiple versions of this image for multiple versions of phan (phan 0.8, 0.9) etc.

Is there a way that we can also make this work?
In my head these should all be the same image just with different labels at least.

1 more thing to throw into the mix.

Right now we have a mediawiki-phan image, and I want to be able to create multiple versions of this image for multiple versions of phan (phan 0.8, 0.9) etc.

Is there a way that we can also make this work?
In my head these should all be the same image just with different labels at least.

I'm not sure I understand correctly, but what you want to do is something like:

  • have images called "mediawiki-phan"
  • have tags like 1.30.9~phan0.8

    while this is of course absolutely possible in docker terms, it would be a nightmare to maintain in terms of coordinate security upgrades, as we want just one tag to be the latest that needs to be updated with security refreshes.

I would suggest you use a naming scheme like mediawiki-phan0.8 instead, if these containers belong to the CI repo or the ops one.

Okay!

mediawiki-phan-0.8 as an image name and then we can leave the tag for other versioning!

Thanks!

Change 384081 merged by Giuseppe Lavagetto:
[operations/docker-images/docker-pkg@master] Port docker builder

https://gerrit.wikimedia.org/r/384081

Change 388031 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::ci::shipyard: add docker-pkg

https://gerrit.wikimedia.org/r/388031

Change 388031 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::ci::shipyard: add docker-pkg

https://gerrit.wikimedia.org/r/388031

Change 391734 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/config@master] docker: operations-puppet -> docker-pkg image

https://gerrit.wikimedia.org/r/391734

Change 391734 merged by jenkins-bot:
[integration/config@master] docker: operations-puppet -> docker-pkg image

https://gerrit.wikimedia.org/r/391734

Thanks @thcipriani for converting the use of ops/puppet already!

Joe moved this task from Backlog to Blocked on others on the User-Joe board.

I have finally started the conversion of the CI images to docker-pkg and even sent a few patch "upstream"!

My it is almost midnight out there feedback

The major gotcha I have is the image that are being build are not tagged with latest right away. Thus while building the chain, containers having FROM parent:latest ends up being build with the outcome of the previous build. https://gerrit.wikimedia.org/r/#/c/398265/ should add the tag right away.

I am still not sure how the dependencies are managed nor how to use variables in FROM. But I guess I will figure out eventually. To enforce build order, one has to set a Depends: in the control file. Potentially we could have it semi automatically set by parsing the FROM lines. But I am probably other thinking.

An image being rebuild doesn't automatically rebuild the descendant images. One has to dch -i each of the descendants. Probably we could build the whole tree based on FROM / control, figure out images that need rebuild and dch -i them automatically to trigger a rebuild.

A great thing is http_proxy is injected in the apt configuration. I am using apt-cacher-ng and setting http_proxy has an environment variable also act on other commands such as npm/gem/composer which apt-cacher-ng do not handle out of the box.

The code is way more robust than the half hacked build.py we had :]

I am working on it with @Joe acting as the mentor for docker-pkg. I am quite happy about the system overall.

The status is:

docker-pkg based images15
left to migrate24
total39

Change 399754 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: convert ci-src-setup to docker-pkg

https://gerrit.wikimedia.org/r/399754

Change 399754 merged by jenkins-bot:
[integration/config@master] docker: convert ci-src-setup to docker-pkg

https://gerrit.wikimedia.org/r/399754

Change 399791 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Convert ci-src-setup-simple to docker-pkg

https://gerrit.wikimedia.org/r/399791

Change 399791 merged by jenkins-bot:
[integration/config@master] Convert ci-src-setup-simple to docker-pkg

https://gerrit.wikimedia.org/r/399791

Change 399793 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Convert tox docker images to docker-pkg

https://gerrit.wikimedia.org/r/399793

Change 399793 merged by jenkins-bot:
[integration/config@master] Convert tox docker images to docker-pkg

https://gerrit.wikimedia.org/r/399793

Change 399801 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Convert lintr docker image to docker-pkg

https://gerrit.wikimedia.org/r/399801

Change 399801 merged by jenkins-bot:
[integration/config@master] Convert lintr docker image to docker-pkg

https://gerrit.wikimedia.org/r/399801

Change 388450 had a related patch set uploaded (by Hashar; owner: Giuseppe Lavagetto):
[integration/config@master] Convert npm, npm-test to docker-pkg

https://gerrit.wikimedia.org/r/388450

Change 388450 merged by jenkins-bot:
[integration/config@master] Convert npm, npm-test to docker-pkg

https://gerrit.wikimedia.org/r/388450

Change 399837 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Revert "Convert npm, npm-test to docker-pkg"

https://gerrit.wikimedia.org/r/399837

Change 399837 merged by jenkins-bot:
[integration/config@master] Revert "Convert npm, npm-test to docker-pkg"

https://gerrit.wikimedia.org/r/399837

Change 403642 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate composer to docker-pkg

https://gerrit.wikimedia.org/r/403642

Change 403647 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate composer-package to docker-pkg

https://gerrit.wikimedia.org/r/403647

Change 403642 merged by jenkins-bot:
[integration/config@master] Migrate composer to docker-pkg

https://gerrit.wikimedia.org/r/403642

Change 403647 merged by jenkins-bot:
[integration/config@master] Migrate composer-package to docker-pkg

https://gerrit.wikimedia.org/r/403647

Change 403654 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate composer-test to docker-pkg

https://gerrit.wikimedia.org/r/403654

Change 403654 merged by jenkins-bot:
[integration/config@master] Migrate composer-test to docker-pkg

https://gerrit.wikimedia.org/r/403654

Change 403896 had a related patch set uploaded (by Hashar; owner: Giuseppe Lavagetto):
[integration/config@master] Convert npm, npm-test to docker-pkg (2)

https://gerrit.wikimedia.org/r/403896

Change 403896 merged by jenkins-bot:
[integration/config@master] Convert npm, npm-test to docker-pkg (2)

https://gerrit.wikimedia.org/r/403896

Status update

Containers left to migrate are:

mediawiki-phan
mediawiki-phpcs
npm-browser-test
npm-stretch
npm-test-graphoid
npm-test-librdkafka
npm-test-maps-service
npm-test-mathoid
npm-test-stretch

Change 403907 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate npm-test images to docker-pkg

https://gerrit.wikimedia.org/r/403907

Change 403907 merged by jenkins-bot:
[integration/config@master] Migrate npm-test images to docker-pkg

https://gerrit.wikimedia.org/r/403907

Change 403921 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate npm stretch containers to docker-pkg

https://gerrit.wikimedia.org/r/403921

Change 403921 merged by jenkins-bot:
[integration/config@master] Migrate npm stretch containers to docker-pkg

https://gerrit.wikimedia.org/r/403921

Change 403923 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate mediawiki-phpcs to docker-pkg

https://gerrit.wikimedia.org/r/403923

Change 403923 merged by jenkins-bot:
[integration/config@master] Migrate mediawiki-phpcs to docker-pkg

https://gerrit.wikimedia.org/r/403923

Change 403925 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate mediawiki-phan to docker-pkg

https://gerrit.wikimedia.org/r/403925

Change 403925 merged by jenkins-bot:
[integration/config@master] Migrate mediawiki-phan to docker-pkg

https://gerrit.wikimedia.org/r/403925

I have migrated all the remaining images.

Still need to convert our build wrapper to invoke docker-pkg and let it update jjb manifests.
I would also like a way to recursively build child images whenever the parent is changed.

But that can be tracked in other tasks under docker-pkg.

Change 404051 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Fix npm-browser-test docker tag

https://gerrit.wikimedia.org/r/404051

Change 404051 merged by jenkins-bot:
[integration/config@master] Fix npm-browser-test docker tag

https://gerrit.wikimedia.org/r/404051