Page MenuHomePhabricator

Fix pipeline image publishing workflow
Closed, ResolvedPublic

Description

Currently our model deployment pipelines are composed of two stages:

  • run-test
  • run-production

The tests are simple formatting/syntax checks and the production stage builds the production image and if successful, pushes it to the WMF docker registry. One issue with this is that for each patchset in a CR, the image is pushed to the registry, which is confusing and wastes space/resources because after the CR is merged, the image gets built and published again.

We need to split the image publishing section to it's own stage that only runs during the gate-and-submit jobs.

Event Timeline

Looking at the PipelineLib and Deployment pipeline docs, it seems we will need to define an additional stage (i.e. publish) that pushes the production image to the registry during the gate-and-submit jobs. Right now, we are including this inside of the production stage which is not great.

https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#Publishing_Docker_Images

Ok, it seems we may need a separate "publish" pipeline defined in .pipeline/config.yaml (inference-services repo) for each image that will get called during postmerge:

editquality-publish:
  blubberfile: editquality/blubber.yaml
  stages:
    - name: production
      build: production
      publish:
        image:
          tags: [stable]

In the integrations/config repo, we add the new pipeline in jjb/project-pipelines.yaml and then in zuul/layouts.yaml we add the new job to the postmerge (under gate-and-submit) like this:

...
    postmerge:
      - trigger-inference-services-pipeline-editquality-publish

I will try to get a CR up on gerrit later today with a publish pipeline for each image. It might be best to include all of the new publish pipelines in one CR since all images get rebuilt when we edit the config.yaml. After that we can work on the integrations/config CRs.

Change 751777 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] pipeline: add separate publish pipelines

https://gerrit.wikimedia.org/r/751777

Change 751777 merged by Accraze:

[machinelearning/liftwing/inference-services@main] pipeline: add separate publish pipelines

https://gerrit.wikimedia.org/r/751777

Change 752714 had a related patch set uploaded (by Accraze; author: Accraze):

[integration/config@master] inference: add publishing pipelines for postmerge

https://gerrit.wikimedia.org/r/752714

Change 752714 merged by jenkins-bot:

[integration/config@master] inference: add publishing pipelines for postmerge

https://gerrit.wikimedia.org/r/752714

I have manually triggered CI postmerge builds on the latest change that touched .pipeline/config.yaml: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/751777

Looks like all the jobs are success ;)

ACraze claimed this task.

Awesome thank you for all your help @hashar!

Things look good on my end, going to mark this task as RESOLVED