Page MenuHomePhabricator

Execution of the deployment pipeline should be configurable via .pipeline/config.yaml
Open, NormalPublic

Description

ORES repo has two services, one uwsgi and one celery. Right now, blubber/the pipeline only makes one Dockerfile per repo. After talking to @akosiaris, it seems we need to discuss how we can fix this.

  • Maybe we can just have two production stages (like 'production-celery' and 'production-uwsgi') My knowledge about docker/blubber/helm is pretty basic. Would this work?

Event Timeline

jijiki triaged this task as Normal priority.Dec 3 2018, 1:22 PM
Ladsgroup raised the priority of this task from Normal to Needs Triage.
Ladsgroup triaged this task as Normal priority.

I didn't change the priority.

thcipriani renamed this task from Blubber should be able to make multi docker files per repo to The continuous release pipeline should support more than one service per repo.Jan 7 2019, 6:49 PM
thcipriani updated the task description. (Show Details)

I'll need multiple service deployments for the same repo for T211247, but I don't need different Docker images for them. So I think this won't affect my use case, but I'd like to note that it would make sense to not couple specific Docker images with a repo.

Perhaps the blubber configs should live elsewhere somehow? Could blubber be configured to build docker images with the app repo as a dependency, rather than have the config expect to live in the repo itself?

Q: would blubber's variants be enough to support the wsgi vs celery use case?

Q: would blubber's variants be enough to support the wsgi vs celery use case?

I thought about it too and it's not super bad idea but we probably need to define all stages on two levels, like uwsgi-build, uwsgi-dev, uwsgi-test, uwsgi-prep, uwsgi-prod, celery-build, celery-dev, celery-test, celery-prep, celery-prod (maybe not test and dev but everything else)

I think there are a couple of problems with the current Continuous Delivery pipeline implementation:

  1. Implicit assumption that every repo is one service
  2. Implicit assumption that there is one test entrypoint per repo

There are workarounds for problem 2, but there are no workarounds for problem 1 just yet. Discussion here will probably inform the solution.

Perhaps the blubber configs should live elsewhere somehow? Could blubber be configured to build docker images with the app repo as a dependency, rather than have the config expect to live in the repo itself?

That's possible. There are a couple of ways we could do this. (1) Something akin to the deploy repos for scap3 that we use now. That is, a top repo that contains a .pipeline/blubber.yaml and the code itself as a submodule. Or (2) Something akin to the deployment-charts repo where the top level is a list of repositories that we can map to Blubberfiles or Dockerfiles. I think I'd like to avoid the latter since that would mean either (a) a single team becomes the bottleneck for making changes (as RelEng is now for integration/config) or (b) everyone is able to make modifications to all charts.

Q: would blubber's variants be enough to support the wsgi vs celery use case?

I thought about it too and it's not super bad idea but we probably need to define all stages on two levels, like uwsgi-build, uwsgi-dev, uwsgi-test, uwsgi-prep, uwsgi-prod, celery-build, celery-dev, celery-test, celery-prep, celery-prod (maybe not test and dev but everything else)

that'd work for building images manually, but currently the Continuous Delivery pipeline on Jenkins (which does image building magically on postmerge or when you push a tag) only looks for a single test variant and a single production variant. We could do as in .gitlab-ci.yaml which is close to this solution, that is: look for variants matching the expression /.*test/ and /.*production/.


What we've been talking about internally

For whatever reason it's become common to keep top-level dotfiles within a project itself (which I think looks cluttered, but that's likely been judged to be not a valid concern given the practice's proliferation). We've been talking about how to keep the .pipeline directory and still solve our implicit assumption problems. What our (folks in Release-Engineering-Team, so far) discussion has focused around is another file (groundbreaking, I know :)).

The file would look something like

.pipeline/config.yaml
---
# Tests for serviceOne: both run in parallel during the "test" stage of the Pipeline
- name: serviceOne-phpunit
  blubberfile: blubber-serviceOne.yaml
  stage: test
  variant: phpunit
  directory: .
- name: serviceOne-mocha
  blubberfile: blubber-serviceOne.yaml
  stage: test
  variant: mocha
  directory: .

# Tests for serviceTwo: run in parallel with the the serviceOne tests, also during the "test" stage of the Pipeline
- name: serviceTwo-junit
  blubberfile: blubber-serviceTwo.yaml
  stage: test
  variant: junit
  directory: src/serviceTwo

# Production service one. Image is build in the "production" stage of the Pipeline
- name: serviceOne
  blubberfile: blubber-serviceOne.yaml
  stage: production
  directory: .

# Production service two. Image is built in the "production" stage of the Pipeline (in parallel with the servieOne image)
- name: serviceTwo
  blubberfile: blubber-serviceTwo.yaml
  stage: production
  directory: src/serviceTwo
...

Would that solve the problems in this task?

What new problems does this create?

  • Harder to link an image to a repo
  • Potential that one merge can use a lot of CI executors (also currently the case, but RelEng has some ability to mitigate currently)
  • Others?

I'd be interested in what folks think about the above?

Seems like it would work, but it doesn't look like this provides much beyond the different variants in the blubber.config files. Could the stage and directory keys just be built into the variant config? Or, does that couple the blubber format to our CI pipeline in a way we don't want?

What new problems does this create?

  • Potential that one merge can use a lot of CI executors (also currently the case, but RelEng has some ability to mitigate currently)
  • Others?

Thanks for the write-up, @thcipriani! A couple of concerns I had today after thinking more about the problem and reading the proposal:

  • Fragmentation of job scheduling. Introducing this piece into the pipeline (especially if it supports running separate services through the pipeline in parallel) might result in an overall job scheduling system that's difficult to grok and troubleshoot. We'd have Zuul on the one side scheduling jobs for the repo but then we'd have a single job from that scheduler forking off multiple CD pipeline runs. This sort of speaks to shortcomings in Zuul v2—if it supported repo-authoritative pipeline-job mappings we could probably implement this logic there—and maybe we're willing to accept the cost of this complexity for now since plans to replace Zuul v2 are still very much undefined.
  • Lack of visibility/reporting into the multiple CD pipeline invocations. Since it's downstream from Zuul, I don't see a way to report results back to Gerrit for each distinct pipeline run. Upon failure of any of the runs, we'd only see a single failure reported in Gerrit and a single scheduled job in Zuul. We'd also only ever see one job on the CI/Zuul dashboard—a user would have to go digging for the progress and/or results of individual runs.
  • The example config format could be more constrained IMO. The format you have would allow for some really odd configurations such as having different Blubber files for test and production stages of the same service, or defining a production stage for a service without a test stage. What about something simpler like the following?
.pipeline/config.yaml
pipelines:
  serviceOne:
    blubberfile: serviceOne/blubber.yaml # could be the default based on service name for the dir
    helmConfig: serviceOne/helm.yaml # ditto
    directory: src/serviceOne
    variants:
      test: [phpunit, mocha] # defaults to ["test"]
      production: foo # defaults to "production", also supports false for test-only runs
  serviceTwo:
    directory: src/serviceTwo

# room for future parameters, e.g.
# concurrency: 2

I think there are a couple of problems with the current Continuous Delivery pipeline implementation:

  1. Implicit assumption that every repo is one service

For the usecase of ores, the only difference between ServiceOne and ServiceTwo is the entrypoint. Everything else is the same.

  1. Implicit assumption that there is one test entrypoint per repo

Regarding tests, python libraries run flake8 (linting) and pytest (unit/integration tests). I can wrap it in tox.ini but I rather not but if it's decided and you only let one entrypoint, I can fix that.

I went a little crazy with a new config proposal in anticipation of us implementing T216272: The pipeline should provide a way to save artifacts from a stage. It's more loosely coupled, like what @thcipriani proposed earlier, with some extra fields for clearly defining the way in which stages should be executed and different methods for publishing artifacts. We'd likely want some basic policy/validation that enforces sanity (e.g. if it's publishing an image in a stage, it must also specify testDeploy, etc.). Useful defaults would also be important to cut down on configuration duplication.

Overall, something like this would decouple our servier-pipeline scripts from project needs, and render the former "just" an implementation that could be swapped out down the road depending on which way we go with CI technologies (Zuul/Jenkins) in coming quarters.

.pipeline/config.yaml
pipelines:
  serviceOne:
    blubberfile: serviceOne/blubber.yaml # could be the default based on service name for the dir
    directory: src/serviceOne
    execution:                           # an "execution plan" (a directional graph of stages to run)
      - [unittests, mocha]               # set of stages to run in parallel
      - production                       # next stage to run if the previous ran successfully
    stages:                              # stage defintions
      - name: unittests
        variant: phpunit                 # defaults to the stage name but can be different
        publish:
          - type: files                  # publish select artifact files from the built/run image
            paths: ["foo/*", "bar"]      # copy files {foo/*,bar} from the image fs to ./artifacts/{foo/*,bar}
      - name: mocha                      # default (build/run "mocha" variant, no artifacts, etc.)
      - name: production
        testDeploy:                      # deploy to the "ci" k8s cluster, run `helm test`, etc.
          - chart: http://helm/chart     # use this chart (don't need the helmConfig field anymore)
        publish:
          - type: image                  # publish built image to our docker registry
            tags: [candidate]            # additional tags
        deploy: true                     # al final, trigger production deployment (however)
  serviceTwo:
    directory: src/serviceTwo
awight removed a subscriber: awight.Mar 21 2019, 4:04 PM

Change 502917 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] pipeline: Execution graph and contexts

https://gerrit.wikimedia.org/r/502917

Change 502918 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] pipeline: Builder and stage implementation

https://gerrit.wikimedia.org/r/502918

thcipriani renamed this task from The continuous release pipeline should support more than one service per repo to Execution of the deployment pipeline should be configurable via .pipeline/config.yaml.Apr 30 2019, 4:16 PM

Change 502917 merged by jenkins-bot:
[integration/pipelinelib@master] pipeline: Directed graph execution model

https://gerrit.wikimedia.org/r/502917

Change 502918 merged by jenkins-bot:
[integration/pipelinelib@master] pipeline: Builder and stage implementation

https://gerrit.wikimedia.org/r/502918