Page MenuHomePhabricator

Allow dash-suffixes for chart versions
Closed, DeclinedPublic

Description

For Helm charts in the deployment-charts repository, we currently use simple semantic version numbers, like 1.23.7. This ticket proposes to establish a convention that recommends to add a "dash-suffix" to each version number that represents the purpose of the version update (ideally, the same as the topic tag of the gerrit change that contains the version bump). The Helm documentation states that this kind of suffix is supported: https://helm.sh/docs/v2/developing_charts/#charts-and-versioning

Note: The idea is to provide the suffix together with a new version. A situation where versions differ only in the suffix should be avoided, since then the order of these versions would not be clearly defined.

Advantages:

  • The version shown by helm list and friends would be more informative, making it obivous what feature was last deployed.
  • It would become impossible for a version bump to vanish in a rebase, and confusion could be avoided during the manual rebase as well.

Caveat:

  • When resolving a rebase conflict, care must still be taken to preserve the correct order of patch versions.

Event Timeline

The conversation in #wikimedia-serviceops when this was raised:

<rzl> mmm I agree with you about the problem but I'm not sure about the solution
<rzl> I think you can still get into confusing versioning semantics with a race condition if 0.0.1 is live, I write 0.0.2-rzl, and you write 0.0.2-duesen
<rzl> if we don't do that, i.e. if I write 0.0.2-rzl and you write 0.0.3-duesen, then I think we'd get the same results as with just 0.0.2 and 0.0.3, right?
<rzl> broadly though I'd like us to find some way to move away from having a chart version at all, although that requires a bit more engineering -- the fact is we already have a timestamped linear progression of chart versions because we keep them in version control, so we shouldn't have to also manage this by hand! like I say though it's a bigger project
<rzl> semver is nice in principle, it's good to distinguish between major, minor, and patch versions -- but in practice I think it costs us more bookkeeping than it gets us value
<cdanis> ^
<cdanis> for dependencies between our own charts, we have a monorepo and should treat it as such
<hnowlan> maybe it's just the charts I've worked on but I don't think I've seen a single chart that consistently does proper semver. which is not exactly unusual for semver-aligned things in general, but it's very prominent in Chart.yaml
<hnowlan> heh we have 76 charts with a major and minor of 0
<rzl> yeah
<rzl> and I also don't think that's losing us anything, really
<cdanis> ^
<cdanis> I don't know of a case where the version bump hasn't just been added friction

@daniel, you didn't respond to that discussion, so please chime in if we were missing something. Otherwise, I don't think we should do this. Instead, I'm open to discussing other work that addresses the two problems I think this solution is intended for:

  • It's not obvious what change was last deployed. (I don't think reaching for helm list first is best, in our setup: it requires escalating to the -deploy credentials, which this task shouldn't -- although the chart version is also visible in tools like kubectl describe deployment. But either way there would still be no information about values changes against the same chart version, which are important to highlight for this use case. Presently I think the best way to ask "does production match the charts repo?" is helmfile diff.)
  • It's annoying and error-prone to rebase a commit that includes a chart version change. (In the conflicted case, I don't think this proposal solves the problem -- that's the "care must still be taken" point above. In the unconflicted case, I think asking each person to do the added manual work of adding a suffix is a step in the wrong direction. Tools like sextant update that can automatically bump the chart version wouldn't be able to do this, too.)

I do think those problems are worth working on.

<rzl> mmm I agree with you about the problem but I'm not sure about the solution
<rzl> I think you can still get into confusing versioning semantics with a race condition if 0.0.1 is live, I write 0.0.2-rzl, and you write 0.0.2-duesen

I would get a merge conflict in git. When resolving it, I would change mine to 0.0.3-duesen.

<rzl> if we don't do that, i.e. if I write 0.0.2-rzl and you write 0.0.3-duesen, then I think we'd get the same results as with just 0.0.2 and 0.0.3, right?

Except in the race case - 0.0.1 is live and we both write 0.0.2. Then your goes in, and mine later gets a +2 and gerrit rebases it on top of yours. Since the version lines are identical, that doesn't produce a conflict. I deploy mine, but helmfile thinks the chart didn't change because there is no version bump, and a partial/incomplete deployment. This happened to me last week. Super confusing.

<rzl> broadly though I'd like us to find some way to move away from having a chart version at all, although that requires a bit more engineering -- the fact is we already have a timestamped linear progression of chart versions because we keep them in version control, so we shouldn't have to also manage this by hand! like I say though it's a bigger project
<rzl> semver is nice in principle, it's good to distinguish between major, minor, and patch versions -- but in practice I think it costs us more bookkeeping than it gets us value
<cdanis> ^
<cdanis> for dependencies between our own charts, we have a monorepo and should treat it as such
<hnowlan> maybe it's just the charts I've worked on but I don't think I've seen a single chart that consistently does proper semver. which is not exactly unusual for semver-aligned things in general, but it's very prominent in Chart.yaml

I think proper semver is useful only for charts that get imported into other charts. But for dependencies within a monorepo, it's not particularly useful. For our use case, a linear version number would be sufficient and more clear. A timestamp would do.

To stay compliant with the requirement to use semver syntax, we could use the pattern 1.0.0-<timestamp>-<hash>, perhaps.

@daniel, you didn't respond to that discussion, so please chime in if we were missing something. Otherwise, I don't think we should do this.

Hm - would it actually cause problems if I added the suffix for the patches I make? Take for instance the chain of patches starting at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1212107, plus a few patches "off to the side" that touch the same chart. Adding the suffix has helped me keep the versioning straight and avoid accidental duplication.

Instead, I'm open to discussing other work that addresses the two problems I think this solution is intended for:

  • It's not obvious what change was last deployed. (I don't think reaching for helm list first is best, in our setup: it requires escalating to the -deploy credentials, which this task shouldn't -- although the chart version is also visible in tools like kubectl describe deployment. But either way there would still be no information about values changes against the same chart version, which are important to highlight for this use case. Presently I think the best way to ask "does production match the charts repo?" is helmfile diff.)

I'm new to doing helmfile deployments, so I'll leave this to the experts. I think I saw Claime use helm history when investigating the mess I caused by accidentally dublicating a version number.

  • It's annoying and error-prone to rebase a commit that includes a chart version change. (In the conflicted case, I don't think this proposal solves the problem -- that's the "care must still be taken" point above. In the unconflicted case, I think asking each person to do the added manual work of adding a suffix is a step in the wrong direction. Tools like sextant update that can automatically bump the chart version wouldn't be able to do this, too.)

It's the "auto-resovle conflict" case that loses the version bump that troubles me most.

I do think those problems are worth working on.

I agree that this should be solved by better automation. But I am looking for a mechanism that helps me not break the deployment again in the next few weeks...

MLechvien-WMF subscribed.

Triaging this from our inbox, @daniel @RLazarus were there next steps decided on this?

From my read, there's an existing workaround so I did put the priority to low, is it fine for you?

MLechvien-WMF moved this task from Inbox to Needs Info / Blocked on the ServiceOps new board.

Thanks @MLechvien-WMF, sorry @daniel for not responding sooner.

By "I don't think we should do this," I mean I think we shouldn't adopt it as a standard -- we shouldn't publish it as a recommendation, write tools/CI based on it, or anything like that. This is a problem for everybody and we should solve it for everybody, but I don't think this is the right way to do that, for the reasons above.

But I also don't think we should prohibit it: to answer your question, no, it wouldn't cause problems if you did this for your own patches. You should feel free to do so. (I can't promise that will remain true forever, but we don't currently have any plans for charts infrastructure that would break on chart versions with a semver prerelease suffix.)


In the meantime, I'll bring this back to Service Ops as part of a discussion that's been brewing on what parts of the Helm and helmfile tool suites are meeting teams' needs, and what we want to change. Helm chart version numbers are one major area to discuss (do we need them at all? do they need semantic components beyond a monotonic integer? do we need users to increment them manually, or can software do it?) but I'd rather address that whole problem space than institutionalize this workaround for the merge-conflict problem specifically. Personally, I don't want to tell you to add a suffix, I want to tell you that you can stop thinking about chart versions at all.

I'll leave this open for other opinions, including but not limited to Service Ops colleagues, but I propose to decline the task and, once the team has discussed, come back with a more general one for improved chart versioning.

Thank you for following up on this @RLazarus!

For me, this is no longer urgent, since I found an alternative work-around: comments.

The primary issue for me was "vanishing" version bumps - e.g. if there were two concurrent patches A and B and both bump the version 1.1 to 1.2, and patch A gets merged, and patch B gets rebased, patch B now bumps from 1.2 to 1.2. The bump silently vanished.

I was proposing to use dash suffix like 1.2-A and 1.2-B, because this was, rebasing B on top of A would generate a conflict, prompting a manual resolution which would hopefully result in the correct bump to 1.3-B.

Since the discussion about suffixes on this ticket was inclonclusive, I'm now using yaml comments, as in 1.3 #something. That also forces a merge conflict and avoids "vanishing bumps", but it's invisible to helm and helmfile.

Got it, sounds like a fine workaround. :) Closing as above in that case, but I still hope we can make this better all around and I'll let you know how things progress.