Page MenuHomePhabricator

Decision request - Toolforge component deployment flow details
Closed, ResolvedPublic

Description

Problem

There's an agreed high level flow for toolforge component deployment outlined here:

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Kubernetes_component_workflow_improvements

But at implementation time there's some details that need deciding/refining as those showed to be controversial.
This is a decision request to decide on those details.

Note that the focus is on the continuous delivery of toolforge components, not of toolforge itself (that might require another discussion).

  • What artifacts do we want to ship? (image, helm chart, etc.)
  • How to version those artifacts
  • When to ship those artifacts

Some assumptions (though feel free to challenge later):

  • We want to do this on gitlab
  • We want to use gitlab ci
  • We want to store the artifacts in harbor
  • The deployment of the artifacts to toolforge (tools or toolsbeta) is handled by a different repo, assuming here that there is or will be human validation at that stage
  • We want to automate the building of the artifacts, even if the trigger to build them is not
  • We publish the artifacts at the same time in tools and toolsbeta (note that this does not mean deployment on toolforge, just publishing the artifacts on harbor)

Constraints and risks

If delivered late:

  • Some increased maintenance on our side until that is done to manually deliver the current components
  • Might become harder to adopt due to the increase of components and lack of prioritization

If not all the full flow is delivered:

  • Possible increased maintenance to build and release components
  • Mix of processes flows might make release of code error-prone (wmcs.toolforge.component.build + ./deploy.sh vs toolforge-deploy + helm chart bump)

Decision record

Decided

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T339198_Toolforge_component_deployment_flow_details

Partial Options

Note that these three decisions will be dependent on each other, so for a different publish flow you might want a specific versioning scheme and set of artifacts that is different than another publish flow. So take that into account when suggesting solutions.

Versioning scheme

VS Option 1

0.0.X-TTTTT-HHHHH, where:

  • X is the number of commits since the origin of the repository or the previous tag
  • TTTTT is the datestamp (YYYYmmddHHMMSS, ex. 202306151038)
  • HHHHH is the short git hash of the commit used to build

Pros:

  • Identifies the commit it was built on top of
  • Sequential code-wise (if A has more commits than B, A>B)
  • Sequential time-wise (for the same commit, a build today has priority over a build from yesterday)

Cons:

  • No semantic versioning
  • Can't be in the source code itself as it depends on the time + git hash
VS Option 2

A.B.C-TTTTT-HHHHH, where:

  • A.B.C is as semantic version, generated from the git history:
    • Extract the list of commits in historical order:
      • For each commit that has Sem-Ver: major bump the major version
      • For each commit that has Sem-Ver: feature bump the feature version
      • For any other commit, bump the bug version
  • TTTTT is the datestamp (YYYYmmddHHMMSS, ex. 202306151038)
  • HHHHH is the short git hash of the commit used to build

Pros:

  • Identifies the commit it was built on top of
  • Sequential code-wise (if A has more commits than B, A>B)
  • Sequential time-wise (for the same commit, a build today has priority over a build from yesterday)
  • Semantic versioning from git source (developer flags non-backwards compatible commits at review time)

Cons:

  • Can't be in the source code itself as it depends on the time + git hash
  • The version generation is a bit more complicated
VS Option 3

0.0.0, where:

  • 0.0.0 is a static or hardcoded string in the code

Pros:

  • Manual control of the version
  • Version comes from the code itself, no generation needed

Cons:

  • Needs manual updating
    • You might forget to bump (version does not identify code anymore)
    • You might bump to the wrong version (version does not identify code anymore, no semantic versioning, not sequential)
    • You have to review the history to decide if commits are major/feature or minor if you want to follow semantic versioning
VS Option N

ADD MORE HERE!

Artifacts

Arts option 1

Helm and image are delivered separately

Pros:

  • Strong control of each artifact
  • No need to rebuild image if only chart changes (in most cases)
  • No need to update chart if only code changes (in most cases)

Cons:

  • More complex release process/flow, harder to automate
  • Need to keep track of two versions, and updating the correct one in the right place
  • Chart will not ensure it works without overriding the image version it comes with by default
Arts option 2

Helm and image are delivered at the same time

Pros:

  • Only one artifact/version to manage
  • The chart will always work by itself (as the image it comes with is built for it)
  • Easier to automate/simpler workflow

Cons:

  • Loose control of when the publish happens as both image and chart are now tied up
  • Sometimes the image rebuild will be done but not needed (as no code changes happened)
  • Sometimes the helm chart rebuild will be done but not needed (as no code chart changes happened)
Arts option N

ADD MORE HERE!

When to ship

Here "when to ship" means "when to build a container image and the helm chart and push them to harbor".

Shipping Option 1

Ship on manual tag

Pros:

  • Control when the shipping happens
  • Easy to see from the git tags history when a shipping happened

Cons:

  • Might be unable to pin-point which commit was shipped with which tag (as tags are not immutable, they can be moved around, and are local to each git clone)
  • Needs a manual step
  • No peer-review on what is going to be shipped (can't review tag pushes)
Shipping Option 2

Ship on manual tag after version bump commit

Pros:

  • Control when the shipping happens
  • Easy to see from the git tags history when a shipping happened
  • Half-review of what is going to be shipped (at least the commit bump)

Cons:

  • Needs two manual steps (mr to bump + tag)
  • Might be unable to pin-point which commit was shipped with which tag (as tags are not immutable, they can be moved around, and are local to each git clone)
  • Only half-peer-review on what is going to be shipped (can't review tag pushes)
Shipping Option 3

Ship on every merge/push to main

Pros:

  • No manual steps
  • Shipping happens all the time, so hard to pin-point a single one (no tag to flag it, or special commit)
  • Review of what's going to be shipped is the same as code review before merge

Cons:

  • Shipping on every merge of a merge request, even if that version might not be deployed
Shipping Option 4

Ship on version bump manual commit.

The Git tag is a nice addition, and it helps identify things, but is not a hard requirement (or, the workflow doesn't depend on it or break because lack of git tag).

This is similar to what happens with debian packages. The tag mostly works to help backtrace in a short way which version was deployed (can be figured if no git tag anyway with a bit more work).

Pros:

  • Control when the shipping happens
  • Easy to see from the git history when a shipping happened

Cons:

  • One manual step (mr to bump)
Shipping Option N

ADD MORE HERE!

Options

Option 1

Version option: 1
Artifacts option: 1
Shipping option: 1

Using 0.0.X-TTTTT-HHHHH, triggered on manual tag (one for artifact).

Pros:

  • Control on when publish happens
  • Version is sequential code-wise
  • Version is sequential time-wise
  • Version identifies the commit it was first built on

Cons:

  • Several manual steps, one per artifact (image/chart)
  • Lack of review on what's going to ship (can't review tags)
  • Chart might get out of sync with image
  • Tags are local to the repo and might move around, so might not identify the commit the version was generated for
  • Not following semantic versioning

Option 2

Version option: 2
Artifacts option: 2
Shipping option: 2

Using A.B.C-X-TTTTT-HHHHH, triggered by manual commit + tag (shared for both image and chart)

Pros:

  • Control on when publish happens
  • Version is sequential code-wise
  • Version is sequential time-wise
  • Version identifies the commit it was first built on
  • Version follows semantic versioning
  • Partial of review on what's going to ship (bump commit)

Cons:

  • Several manual steps (bump version commit + tag)
  • Partial of review on what's going to ship (can't review tags)
  • Tags are local to the repo and might move around, so might not identify the commit the version was generated for

Option 3

Version option: 2
Artifacts option: 2
Shipping option: 3

Using A.B.C-X-TTTTT-HHHHH, triggered by mr merge or direct push to main

Pros:

  • Version is sequential code-wise
  • Version is sequential time-wise
  • Version identifies the commit it was first built on
  • Version follows semantic versioning (non-backwards compatible commits labeled at development time)
  • Review on what's going to ship (each mr)
  • No manual steps

Cons:

  • No control of when a publish happens (well, it happens once per merge)

Option 4

Version option: 1 (always incremental, no special semantics)
Artifacts option: 2 (chart and container image at the same time)
Shipping option: 4 (ship on version bump, which is manual, and is encouraged to be accompanied by a git tag)

This is what Arturo thinks resembles the most the the debian-like kung-fu way for packages.

Pros:

  • TBD

Cons:

  • TBD

Option 5

Version option: 1 (always incremental, no special semantics)
Artifacts option: 2 (chart and container image at the same time)
Shipping option: 3 (ship on every merge/push to main)

Pros:

  • TBD

Cons:

  • TBD

Option N

ADD MORE HERE!

Event Timeline

With the current option roster, I think I'll go with option 4 which I named the debian-like kung-fu.

I believe that one has all the tradeoff in the right balance, and laves the most critical part (version bump) to a manual git commit.

Both the helm chart and the container image have the same version number, which is otherwise just an incremental number (no special semantics). I have no strong opinions if the version number should contain timestamp and git hash, as long as is incremental with no special semantics.

The artifacts are released at the same time. I believe that in our context, doing this in a different way will just cause more confusion and more moving parts for no added benefit.
This release step can be accompanied by a git tag for extra clarify, but is otherwise not required in the workflow, and should not break anything if not present.

Finally, the shipping part happens at this moment, when the version is bumped.

The developer has the following stages when releasing / deploying to production:

  • test locally, possibly using lima-kilo (unrelated to this ticket)
  • create a pre-release using the workflow described above, targeted for toolsbeta (example, like 1.2.3~toolsbeta1)
  • do adjustments and iterate, if required (continuing with the example, 1.2.3~toolsbeta2).
  • then once happy, merge another patch to bump version (and trigger shipping etc) (from 1.2.3~toolsbeta1 to 1.2.3, which sorts higher)

Worth noting that this is the workflow currently at play for most of the debian packages we have, like jobs-framework-cli and others. And it works just fine in my opinion.

With the current option roster, I think I'll go with option 4 which I named the debian-like kung-fu.

I believe that one has all the tradeoff in the right balance, and laves the most critical part (version bump) to a manual git commit.

Both the helm chart and the container image have the same version number, which is otherwise just an incremental number (no special semantics). I have no strong opinions if the version number should contain timestamp and git hash, as long as is incremental with no special semantics.

Can you elaborate what means for you incremental?

Does it mean that it increases on every build? (ex. it depends on when it's being built)
That it increases on every commit to the repo?
Does it have to be strictly sequential? (no jumps in the numbers)
All of the above?

The artifacts are released at the same time. I believe that in our context, doing this in a different way will just cause more confusion and more moving parts for no added benefit.
This release step can be accompanied by a git tag for extra clarify, but is otherwise not required in the workflow, and should not break anything if not present.

Finally, the shipping part happens at this moment, when the version is bumped.

The developer has the following stages in the before releasing / deploying to production:

  • test locally, possibly using lima-kilo (unrelated to this ticket)
  • create a pre-release using the workflow described above, targeted for toolsbeta (example, like 1.2.3~toolsbeta1)

Helmfile requires semantic version format, I guess than the equivalent would be 1.2.3-toolsbeta1.

Note that creating this pre-release does not mean that you can deploy it anywhere, the deployment is handled in the toolforge-deploy repo, not here, so the flow with deployment to toolforge would have to be:

  • create MR bumping the version on the component repo to 1.2.3-toolsbeta1
    • how is it generated?
    • is it done manually?
  • get it reviewed and merged, this pushes that version chart and image
    • to harbor toolsbeta only?
    • to both tools and toolsbeta?
  • create MR bumping the version on toolforge-deploy for toolsbeta
  • get it reviewed and merged
  • manually deploy that change to toolsbeta
  • test there, and iterate from the MR creation on the component repo with version 1.2.3-toolforgeN where N is the number of iterations
  • once happy, create MR on the component repository to bump the version to 1.2.3
  • get it reviewed and merged, this pushes the image and chart to harbor
  • create MR bumping the version on toolforge-deploy for toolsbeta
  • get it reviewed and merged
  • manually deploy that change to toolsbeta
  • create MR bumping the version on toolforge-deploy for tools
  • get it reviewed and merged
  • manually deploy that change to tools
  • verify the deployment

Is that correct?
I count at least 5 MRs to do a release, on two different repositories.
This is MRs with 0 code in them, just version bumping (there might be some fixing bugs found, but that is on top of these ones).

  • do adjustments and iterate, if required (continuing with the example, 1.2.3~toolsbeta2).
  • then once happy, merge another patch to bump version (and trigger shipping etc) (from 1.2.3~toolsbeta1 to 1.2.3, which sorts higher)

Worth noting that this is the workflow currently at play for most of the debian packages we have, like jobs-framework-cli and others. And it works just fine in my opinion.

Personally I don't think that workflow works that well, I always find it cumbersome, error-prone and tedious, but that is a different discussion in a different context.

Can you elaborate what means for you incremental?

Does it mean that it increases on every build? (ex. it depends on when it's being built)
That it increases on every commit to the repo?
Does it have to be strictly sequential? (no jumps in the numbers)
All of the above?

Just a number that sorts higher than the previous. It increases with each version bump, manually, when the given component is ready for a new release.

Example:

0.0.0
0.0.1
0.0.2
0.0.3
0.1.0
0.1.2
0.1.3

etc.

The numbers don't have any meaning. We could just keep bumping only the last number like 0.0.99999.

I believe semver is losing relevance by the day. On a fun side note:

image.png (630×1 px, 366 KB)

  • create MR bumping the version on the component repo to 1.2.3-toolsbeta1
    • how is it generated?
    • is it done manually?

Yes, manually.

  • get it reviewed and merged, this pushes that version chart and image
    • to harbor toolsbeta only?
    • to both tools and toolsbeta?

I don't think we need more than one. We can have everything in the same harbor repo.

As you know, the version to be deployed doesn't depend on where this artifacts are stored. It depends on what helmfiles says for each environment.

Is that correct?
I count at least 5 MRs to do a release, on two different repositories.
This is MRs with 0 code in them, just version bumping (there might be some fixing bugs found, but that is on top of these ones).

Something like this, yes. What you wrote is perhaps the most extreme case, but in a simplified fashion it would be:

  • create MR to bump version to be build/released on component repo (it can be the same for toolsbeta/tools)
  • create MR to bump version to be deployed on toolforge-deploy repo

If adding the toolsbeta stage then:

  • create MR to bump version to be build/released on component repo (it can be the same for toolsbeta/tools)
  • create MR to bump version to be deployed on toolforge-deploy repo (toolsbeta)
  • create MR to bump version to be deployed on toolforge-deploy repo (tools)

This, by the way, is very similar to the currently well-established workflow with puppet & debian packages:

debian package repository -- same role as -- toolforge component repository
operations puppet tree -- same role as -- toolforge deploy repoistory

You populate a debian repository somewhere (coding, release, etc). Then you make changes in the puppet tree to deploy a given version of the package.

I believe semver is losing relevance by the day. On a fun side note:

image.png (630×1 px, 366 KB)

I don't think it's losing relevance, but lots of people misuse it yes (us included in a lot of places). It is really really useful when followed though, it both saved a lot of issues and enabled automation in many of the places I have seen continuous delivery working.

The lack of such differentiation between breaking changes and non-breaking ones has always ended up as a blocker for automating the delivery process.

  • get it reviewed and merged, this pushes that version chart and image
    • to harbor toolsbeta only?
    • to both tools and toolsbeta?

I don't think we need more than one. We can have everything in the same harbor repo.

This is a big change, I thought that we had decided to have duplicate deployment (tools/toolsbeta) and have those completely independent (that's the reason to have two harbor instances, that was and still is really useful for testing).
If that is not the case, we should discuss it and clarify if we want two independent deployments (as much as possible) or not, probably another decision task?

Is that correct?
I count at least 5 MRs to do a release, on two different repositories.
This is MRs with 0 code in them, just version bumping (there might be some fixing bugs found, but that is on top of these ones).

Something like this, yes. What you wrote is perhaps the most extreme case, but in a simplified fashion it would be:

  • create MR to bump version to be build/released on component repo (it can be the same for toolsbeta/tools)
  • create MR to bump version to be deployed on toolforge-deploy repo

If adding the toolsbeta stage then:

  • create MR to bump version to be build/released on component repo (it can be the same for toolsbeta/tools)

What I understood from your explanation before, is that this would be at least two rounds (one with -toolforge1 and one without).

  • create MR to bump version to be deployed on toolforge-deploy repo (toolsbeta)
  • create MR to bump version to be deployed on toolforge-deploy repo (tools)

This, by the way, is very similar to the currently well-established workflow with puppet & debian packages:

debian package repository -- same role as -- toolforge component repository
operations puppet tree -- same role as -- toolforge deploy repoistory

You are mssing here the code repository, that would be:
upstream source code repository -- same role as -- toolforge component repository

You populate a debian repository somewhere (coding, release, etc). Then you make changes in the puppet tree to deploy a given version of the package.

Again, debian packages is a very different context, where the person that writes the code is not the same that does the packaging, and it's not the same as the one that installs the package (that's why each of those has their own manual step).
In this case we are the same ones writing the code, building the artifacts, and deploying them, so IMO a single manual step per environment is more than enough and what we should aim for.

I don't think we need more than one. We can have everything in the same harbor repo.

This is a big change, I thought that we had decided to have duplicate deployment (tools/toolsbeta) and have those completely independent (that's the reason to have two harbor instances, that was and still is really useful for testing).
If that is not the case, we should discuss it and clarify if we want two independent deployments (as much as possible) or not, probably another decision task?

I think it still definitely useful to have a harbor deployment in toolsbeta to, well, test harbor itself as a system.

However, for actual artifact storage and distribution, one might be enough! Happy to discuss this elsewhere if required.

Either having one or two, does not prevent you from storing anywhere you want... in one, in both or doing any other kind of testing or cross-check. And definitely does not invalidate the workflow.

What I understood from your explanation before, is that this would be at least two rounds (one with -toolforge1 and one without).

This is optional, not mandatory. If you want to have some kind of pre-release, this workflow allows for it. I guess that's exactly the point of manual releases.

You are mssing here the code repository, that would be:
upstream source code repository -- same role as -- toolforge component repository

No!

There are debian 'non-native' and 'native' packages (citation). What we do here @WMF is mostly 'native' packages, where there is no separate upstream part.

Examples: jobs-framework-cli, tools-webservice, toolforge-cli, etc.

You populate a debian repository somewhere (coding, release, etc). Then you make changes in the puppet tree to deploy a given version of the package.

Again, debian packages is a very different context, where the person that writes the code is not the same that does the packaging, and it's not the same as the one that installs the package (that's why each of those has their own manual step).

See above about native packages.

Thanks a lot @dcaro for the write-up with all the options. I think it clearly highlights there are multiple axes and a lot of hidden complexity.

My preference is for option 3 (and especially for shipping option 3, on every commit to the main branch).

I would like to understand if we could push it even further, and ship image+chart on each merge request, before it gets merged. My ideal workflow would be that when I open a MR with a change, the CI automatically builds a new image and chart with the changes from my branch, it pushes both to Harbor (either tools or toolsbeta) and I can (optionally) deploy this built chart for testing, either locally on my laptop, or in Toolsbeta, or other test environments (that we don't have yet, but we might build in the future using Magnum, etc.)

I don' have a strong opinion on which versioning scheme we should use, A.B.C-TTTTT-HHHHH seems reasonable because it would avoid conflicts in the scenario I described above where potentially two merge requests are open at the same time, and A.B.C might be the same for both of them. We could even keep A.B.C=1.0.0 forever (or update it very rarely).

Thanks a lot @dcaro for the write-up with all the options. I think it clearly highlights there are multiple axes and a lot of hidden complexity.

My preference is for option 3 (and especially for shipping option 3, on every commit to the main branch).

I would like to understand if we could push it even further, and ship image+chart on each merge request, before it gets merged. My ideal workflow would be that when I open a MR with a change, the CI automatically builds a new image and chart with the changes from my branch, it pushes both to Harbor (either tools or toolsbeta) and I can (optionally) deploy this built chart for testing, either locally on my laptop, or in Toolsbeta, or other test environments (that we don't have yet, but we might build in the future using Magnum, etc.)

I agree, ideally the same artifact that you tested should be the one that you promote (no rebuilding), we can try to implement something like that, pulling and pushing the image with a new tag/to a different project/namespace/harbor.
It will require having into account that a branch might not hang from latest master, so probably will have to rebase before building and such (and fail if there's merge conflicts, etc.).
I think it's doable.

I don' have a strong opinion on which versioning scheme we should use, A.B.C-TTTTT-HHHHH seems reasonable because it would avoid conflicts in the scenario I described above where potentially two merge requests are open at the same time, and A.B.C might be the same for both of them. We could even keep A.B.C=1.0.0 forever (or update it very rarely).

Thanks a lot @dcaro for the write-up with all the options. I think it clearly highlights there are multiple axes and a lot of hidden complexity.

My preference is for option 3 (and especially for shipping option 3, on every commit to the main branch).

I would like to understand if we could push it even further, and ship image+chart on each merge request, before it gets merged. My ideal workflow would be that when I open a MR with a change, the CI automatically builds a new image and chart with the changes from my branch, it pushes both to Harbor (either tools or toolsbeta) and I can (optionally) deploy this built chart for testing, either locally on my laptop, or in Toolsbeta, or other test environments (that we don't have yet, but we might build in the future using Magnum, etc.)

What about mixing the two (shipping option 3 and option 4).

Let's call it option 5:. Auto-generating chart & image on every merge request & merge to main (for easy testing) plus having a 'manual' released one every now and then, bumping the version.

The per-MR artifacts would use a version based on the latest "manually" bumped version, examples:

  • 1.0.0 <-- manually created version
  • 1.0.0+mr1-ttttt-hhhh <-- mr1
  • 1.0.0+mr2-ttttt-hhhh <-- mr2
  • 1.0.1 <--- manually created version
  • 1.0.1+mr3-ttttt-hhhh <-- mr3

The logic to generate the artifacts should not care what the base version is, it will just append "on the fly" the extra string to make the version unique enough.

Thanks a lot @dcaro for the write-up with all the options. I think it clearly highlights there are multiple axes and a lot of hidden complexity.

My preference is for option 3 (and especially for shipping option 3, on every commit to the main branch).

I would like to understand if we could push it even further, and ship image+chart on each merge request, before it gets merged. My ideal workflow would be that when I open a MR with a change, the CI automatically builds a new image and chart with the changes from my branch, it pushes both to Harbor (either tools or toolsbeta) and I can (optionally) deploy this built chart for testing, either locally on my laptop, or in Toolsbeta, or other test environments (that we don't have yet, but we might build in the future using Magnum, etc.)

What about mixing the two (shipping option 3 and option 4).

Let's call it option 5:. Auto-generating chart & image on every merge request & merge to main (for easy testing) plus having a 'manual' released one every now and then, bumping the version.

The per-MR artifacts would use a version based on the latest "manually" bumped version, examples:

  • 1.0.0 <-- manually created version
  • 1.0.0+mr1-ttttt-hhhh <-- mr1
  • 1.0.0+mr2-ttttt-hhhh <-- mr2
  • 1.0.1 <--- manually created version
  • 1.0.1+mr3-ttttt-hhhh <-- mr3

The logic to generate the artifacts should not care what the base version is, it will just append "on the fly" the extra string to make the version unique enough.

I think that the benefit comes only when the generated artifact is ready to use (ex. to be bumped on the toolforge-deploy repo).
In that sense, anything that does not publish a production ready artifact on merge for me lacks the core benefit of continuous delivery (and thus, is a very similar solution to just manually building and pushing).

Having to do 2 manual actions (or more) to get the code deployed in an environment is one manual action too much xd

I'd go with option 3, with slightly preference over other possible options still using shipping option 3 (continuous delivery), but strong preference to other non-shipping option 3 solutions.

It's been a week since this, I set the deadline to either setup a meeting or get consensus for the 29th (one more week).
Please share your opinions! Read and consider all of the others too!

I just added option 5:

  • Version option: 1 (always incremental, no special semantics)
  • Artifacts option: 2 (chart and container image at the same time)
  • Shipping option: 3 (ship on every merge/push to main)

So my vote goes to either option 4 or option 5.

I'm equally fine with option 3 or option 5.

In both cases, I'm not sure I see any benefit in incrementing the version (X in 0.0.X-TTTTT-HHHHH or C in A.B.C-TTTTT-HHHHH), instead of keeping the first part of the version string constant (0.0.0-TTTTT-HHHHH or something similar). Ideally I would use simply TTTTT-HHHHH but I think Helm does not allow that.

dcaro claimed this task.

It seems that option 5 has converged, closing this as decided for option 5, feel free to reopen in the next few days if I misunderstood.