Page MenuHomePhabricator

Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not
Closed, ResolvedPublic

Description

Semantic Versioning Principle Summary

Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes
  • MINOR version when you add functionality in a backward compatible manner
  • PATCH version when you make backward compatible bug fixes

Problem

Now that the consolidated Toolforge services API is being exposed to the public, users will expect the API version information to be consistent and have proper semantic meaning.
Though we never really had a decision request for it (can't remember if we did), we made the tacit decision to use semantic versioning or what appears to be it (at least to our users) for the individual services' APIs and consolidate the versions of the individual services into a single version following rules that still obey semantic versioning principles, before showing this information to our users.
The problem is that right now we only have very basic validation rules for the individual APIs, rules that are not true to semantic versioning principles, so what appears to be semantic versioning to the API users may not really be that.

For example right now if you make a breaking change to jobs-api openapi schema and only bump the patch version i.e 1.0.0 ----> 1.0.1, the current validation rules we have will not complain.

Decision Request Goal

  • Decide on how to ensure that the version number of our APIs reflect the changes made to the API schemas, according to semantic versioning rules.

Constraints and risks

  • The consolidated API is currently public which means we have limited time to make up our minds on this.

Decision record

Option 2 choosen.
https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T373072_to_strictly_enforce_semantic_versioning_rules_for_toolforge_services_APIs_or_not

Options

Option 1

Do nothing

Pros:

  • we won't have to do the work

Cons:

  • the problem is still there

Option 2

manually enforcing semantic versioning. this means we official have it on record that we are doing semantic versioning but leave it at the discretion of the person submitting the patch and the reviewer to decide what that means.

Pros:

  • Better than Option 1
  • We can start this now

Cons:

  • Pron to error

Option 3

automatically enforce semantic versioning using our own custom algorithm

Pros:

  • semantic versioning automatically enforced

Cons:

  • Might take longer to setup compared to other alternatives
  • unanticipated edge cases?

Option 4

automatically enforce semantic versioning using third-party solutions

Pros:

  • semantic versioning automatically enforced
  • potentially faster than Option 3 (at the very least most edge cases should be handled)

Cons:

  • more limited compared to Option 3. So far I only know of oasdiff which only has command-line and go packages (maybe those are enough, maybe not)
  • dependency?

Event Timeline

https://github.com/Tufin/oasdiff (Apache License 2.0) . This can detect breaking and non-breaking changes

Cool tool

I think Option 2 is a good trade off, it's true it is prone to errors, but I like that it forces you to think about the impact of your changes, and to discuss it during code review.

aborrero changed the task status from Open to In Progress.Sep 13 2024, 12:04 PM
aborrero triaged this task as Medium priority.
aborrero subscribed.

I feel like we are mixing in this decision request 2 different points: versioning of the API docs and versioning of the API logic itself.

In general I don't think semantic version provides any value here. Moreover, it could be difficult to maintain, and can introduce more problems than it solves if not maintained properly. I would simply NOT use semver for anything.

For the API documentation I would use another, simplified versioning scheme, for example: <date>, or <date>-<commithash>, or <number> which are always updated in linear / incremental fashion, and the identifier don't carry any special meaning or semantic.

For the API logic and endpoints, it gets a bit more tricky, because there are additional questions to answer.

  • backward incompatible changes: should probably bump embedded number in the API endpoints, i.e if /jobs/v1/tool/{toolname}/images does a thing, and we change to logic in a way that clients would break, then we should introduce /jobs/v2/tool/{toolname}/images and wait everyone to migrate before removing the old endpoint. This comes with a natural bump to the version of the API docs, which again, can be linear or incremental.
  • backward compatible changes: most likely it would be enough to bump the API docs version to record whatever new logic is introduced. It is assumed no clients will break with the new logic, so no need to modify the version number embedded in the API endpoints.

So I guess my vote would be to have some kind of automatic mechanism that sets the version, in a similar fashion that we implemented for toolforge-deploy-related stuff, here T339198: Decision request - Toolforge component deployment flow details

I would go with Option 2, as we already have some checks to make sure we bump the version (https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci#check-openapi-version-bump) and maybe add a "an test for X hours" option 4 checker, to asses how hard it would be to setup (and do so if it's easy).

I do find value on using semver, specially for API users, similar to what most python/golang libraries do, it enables you to do automatic upgrades with a high level of confidence they will not break.

We can handle that directly on each sub-api too, as the merger currently just does a "sum" of the sub-apis versions (https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/blob/main/app/merge.py?ref_type=heads#L44) :)

Some other options I've used in other places, is to allow to generate the API version from the commit history, by flagging commits with headers, like:

My commit

Something changes and breaks XYZ

Sem-Ver: breaking change

so the version will get automatically bumped in CI with a major change (if break in it), a feature change (if feature in it) or just bug fix (if no header, or the previous two don't match). That also allows us to generate nicer and automated release notes from the commit messages, and other advanced release automation, though we can just start with doing so at code review.

Kubernetes uses something similar for action required commits (I'm guessing with many tweaks): https://github.com/kubernetes/kubernetes/pull/61373, that ends up like https://github.com/kubernetes/kubernetes/commit/347f652944cc2348d59b1c815007e21e062fe597#diff-2baf9ac192f2b73f0632fb62ba3b57b96e5d92b3ce4d2d5993b3378c725b4cdfR311

I'd go with Option 2 because it seems like the least labor-intensive one, and I don't think we would get a huge ROI from automating this right now. We could of course always revisit in the future.

Initially I was on the side of Option 4. But maybe that's too extreme and won't be worth sweating over? idk.
Option 2 is till a good middle ground so I have no problem with that too. It's something we can start immediately we close this decision request too. If Option 2 get's more vote we can go with that.