Page MenuHomePhabricator

[builds] What are the current best practices for CI?
Open, MediumPublic

Description

Background:
As a maintainer it is desirable to be able to run continuous integration tests against code changes to verify they do not break any expectations (tests) or generally the build (compilation/assembly).

It is also desirable to track upstream releases in a timely manner, minimising any exposure to security issues and taking advantage of performance improvements. To keep a low-overhead environment this can be achieved with tools such (e.g. Renovate, Dependency Bot, pipup), which either directly (auto merging) or in-directly (human review) rely on the tests to avoid breakage.

When using the builds service (build pack based image), there is no pre-built base image to execute tests in, something that was possible to achieve before my using e.g. docker-registry.tools.wmflabs.org/toolforge-php82-sssd-base in the CI runner (publicly accessible image).

Problem:
The target image can be built using the build pack tooling and then any relevant tests executed, generally this is nice as it offers flexibility regarding versions and a guarantee the "runtime" is almost identical (env vars etc aside).

Unfortunately using the upstream builder image in a way that is compatible with builds-api is problematic:

  1. The version of the builder image in Toolforge does not track the upstream releases (T380127)
  2. Runtime versions available upstream are not available in Toolforge (T408108, T401875, T363854 etc)
  3. The builder image has additional configuration (packs) applied within the TaskRun (https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/blob/main/deployment/chart/scripts/inject_buildpacks.sh), which complicates being able to just use tools-harbor.wmcloud.org/toolforge/heroku-builder directly

Using builds-api to generate the images is also problematic:

  1. Limited concurrent builds prevents any reasonable amount of scalability (https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/blob/main/deployment/chart/values.yaml?ref_type=heads#L43)
  2. Published assets cannot be removed, harbour quota is quite restrictive, preventing any reasonable amount of scalability (https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/blob/main/components/maintain-harbor/values/tools.yaml.gotmpl?ref_type=heads#L36)
  3. The builds-api has no good support for external usage (T332478, T363983)
  4. The builds are somewhat slow, given all the internal objects needing to be created and deleted

Which leads to the question of what is the current best practice for testing build pack based images?

Today there is https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images#Testing_locally_(optional), however this does not cover the injected config/packs as outlined above.

A real world example of where this became problematic:

  1. Python 3.14 was released
  2. The .python-version release pin was updated by tooling to the current stable
  3. CI was happy, development was happy, deployments failed
  4. A human then had to spend time reviewing and down-grading 11 different repos

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Which leads to the question of what is the current best practice for testing build pack based images?

Short answer, there's no current best practice, that is not new with buildpacks/bulids-api though, as there never has been a supported flow for it.

It is also not a trivial problem, so it will take some effort and time to be able to get a "well paved" flow going (specially with the current lack of resources in the team).

Some things that could help in the short-term (from the top of my head):

  • Pin your python version to the one supported by the current buildpacks
  • Have a 'dev/test' tool you deploy on first

Some things that can be explored in the mid-long term:

  • Creating a builder image so it can be run locally or on CI systems (might not be able to run on gitlab.w.o though, would have to double-check with the team that maintains it)
    • This requires an initial effort to package and publish the custom/modified buildpacks, for which we had no easy way before (now gitlab supports packages, so that might be an option)
  • Get more resources so we can keep up with updates (currently the team is -3 members down and -1 manager for more than two quarters)
  • Figure out some custom way to run the same build process that toolforge does for CI runs
    • The first point would help with this too, having a builder image
  • Maybe allow using the latest built image to run on CI somehow
    • The images are already public, so at least on your CI you can reuse the last build and run your new code/tests in it
    • Will not re-pull dependencies though, just use the same ones that were there, so not the full test env specially if the deps changed, for python-pip it might be possible to run pip install/ugrade, but buildpacks are not built for it, so might need some tweaks
  • Move away from buildpacks
    • Blubber might be an option (that's how prod images are built), though it is not as featureful/easy to use afaik (ex. changing runtimes, etc., at least that I know of, it has been some time since I explored the current features)

Will have to investigate and think of more solutions, feel free to propose more, though unfortunately, we are very understaffed and will be for the next couple quarters at least.

It is also not a trivial problem, so it will take some effort and time to be able to get a "well paved" flow going

Indeed and that is fine, what we have works in broad strokes, so this is more about the future. The primary consuming service is still in beta anyway.

Get more resources so we can keep up with updates (currently the team is -3 members down and -1 manager for more than two quarters)

I see there is an open position for at least a manager and some OKR type things to expand/productionise tools

Creating a builder image so it can be run locally or on CI systems

I think this would be a good option if the ecosystem stays in the realm of build packs

Creating a builder image so it can be run locally or on CI systems / Move away from buildpacks

The other option would be to allow external registries, there are some good reasons not to (image size), but given what we have today there is little security added by restricting the image builds; if scripts can be executed then any random binary can find its way into the env, actually that's how some things have been deployed for years, even back with grid engine.

Pin your python version to the one supported by the current buildpacks

This is done by forcing the project version constraint so the install fails on newer versions, it leaves the PR open in a failed state but adding exclusions was a bit more gross. It works, but is overhead.

Have a 'dev/test' tool you deploy on first

I have this for 1 tool, now everything is under components it would be reasonably easy to duplicate every tool, though that just makes the deployment fail on a different tool (with the current restrictions/speed deploying pre-merge for each change is not very viable)

Will have to investigate and think of more solutions, feel free to propose more

I'll come back if any random thoughts come to mind