Page MenuHomePhabricator

Define new Jenkins pipeline for container build phase
Closed, ResolvedPublic

Description

We've sketched out a rough outline for the build phase of the container release pipeline. We should implement this as a Jenkins pipeline (as in the Jenkins Pipeline plugin) and see if we can't get something suitable for building Mathoid images.

  • blubber build test image
  • docker run test image entrypoint
  • decision/feedback fork:
    • test entrypoint passes
      • blubber build production image
      • push image to WMF docker registry
      • provide feedback
    • test entrypoint fails
      • abort pipeline
      • provide feedback

Details

Related Gerrit Patches:
integration/config : masterExperimental service pipeline jobs

Event Timeline

dduvall created this task.Sep 7 2017, 6:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 7 2017, 6:22 PM

Change 380551 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/config@master] WIP Service pipeline DSL

https://gerrit.wikimedia.org/r/380551

dduvall claimed this task.Sep 26 2017, 5:22 PM
dduvall triaged this task as Medium priority.
dduvall moved this task from Backlog to CI on the Release Pipeline board.

An experimental job has been created from the current JJB patchset and is working up until just before the registry push.

Successfully built 9d25c234a7aa
Successfully tagged mediawiki-services-mathoid:build-20
[Pipeline] dockerFingerprintFrom
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
java.io.IOException: Cannot retrieve .Id from 'docker inspectdocker-registry.wikimedia.org/nodejs-devel AS prep'
	at org.jenkinsci.plugins.docker.workflow.client.DockerClient.inspectRequiredField(DockerClient.java:203)
	at org.jenkinsci.plugins.docker.workflow.FromFingerprintStep$Execution.run(FromFingerprintStep.java:119)
	at org.jenkinsci.plugins.docker.workflow.FromFingerprintStep$Execution.run(FromFingerprintStep.java:75)
	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
	at hudson.security.ACL.impersonate(ACL.java:260)
	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
	at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE

This failure may be due to an unresolved bug.

The more specifically relevant bug may be this one which indicates it's an issue with the FROM aliases used for multistage. The recommended workaround is to simply use sh to execute docker build for now. :) Easy enough.

Patchset 4 successfully builds the production image but fails with a 403 when trying to push to the registry. It's unclear whether our uploader credential is being used.

On integration-slave-docker-1705 I changed the docker.service to use debug log level (in ExecStart, pass -D) and journalctl -u docker --follow` to watch.

Then docker push docker-registry.wikimedia.org/wikimedia/mediawiki-services-mathoid:build-28 and in the system log:

msg="Calling GET /_ping"
msg="Calling POST /v1.30/images/docker-registry.wikimedia.org/wikimedia/mediawiki-services-mathoid/push?tag=build-28"
msg="hostDir: /etc/docker/certs.d/docker-registry.wikimedia.org"
msg="Trying to push docker-registry.wikimedia.org/wikimedia/mediawiki-services-mathoid to https://docker-registry.wikimedia.org v2"
msg="Pushing repository: docker-registry.wikimedia.org/wikimedia/mediawiki-services-mathoid:build-28"

Then there is a 403.

Using docker login and the credential works all fine though:

$ docker login docker-registry.wikimedia.org
Username (uploader): uploader
Password
Login Succeeded
$

Who knows whether docker POST with the credentials or if we get access for it :(

hashar added a subscriber: Joe.Sep 27 2017, 7:34 AM

Clarified with @Joe labs instances are not allowed to interact with the docker registry. Nginx rejects them and that is based on the puppet bits:

modules/profile/manifests/docker/registry.pp
class profile::docker::registry {
    $image_builders = hiera(
        'profile::docker::registry::image_builders',
        $network::constants::special_hosts[$::realm]['deployment_hosts']
    )
...
    class { '::docker::registry::web':
        allow_push_from      => $image_builders,
...

And in hiera:

hieradata/role/common/docker/registry.yaml
profile::docker::registry::image_builders: ['10.64.16.176'] # copper.eqiad.wmnet

Hence the 403 is working as intended.

So I guess that has to be build on a production slave, potentially with a dedicated Jenkins instance?

dduvall changed the task status from Open to Stalled.Sep 27 2017, 5:00 PM

Right on. Thanks for debugging that @hashar! I think we can continue to test the experimental job on the labs instance and just push to Docker Hub for now. Once we get the dependencies installed on contint1001 (blubber, docker >= 17.05) and the job is somewhat stable, we can move it there.

Another issue that has arisen in testing this is that the default credential store for Docker is very insecure. By default, it stores all usernames/passwords given to docker login in the clear (in ~/.docker/config.json) until a subsequent docker logout is called. That means there is a substantial window of time where any other Jenkins job running via the same remote agent could gain access to the registry credentials. I'll open a subtask to further discuss and resolve this issue.

dduvall changed the task status from Stalled to Open.Oct 16 2017, 5:11 PM

Credentials are now handled via a docker-pusher script

The push failing with a 403 is now T178606

Joe added a comment.Oct 23 2017, 5:25 AM

As a suggestion: I would host your own registry under the CI project in labs for testing/managing local build you might need to retrieve.

Change 380551 merged by jenkins-bot:
[integration/config@master] Experimental service pipeline jobs

https://gerrit.wikimedia.org/r/380551

dduvall closed this task as Resolved.Dec 18 2017, 5:40 PM