Page MenuHomePhabricator

[builds-api.start] Cleanup old builds when triggering a new one
Closed, ResolvedPublic

Description

We should make sure to cleanup old build before starting a new one, a proposal for the cleanup would be:

  • Keep the latest successful build so you can get the image/details/check when it was built
  • Keep the two latest failed ones for debugging

This way we don't need to depend on another asynchronous process to do it for us.

Event Timeline

dcaro triaged this task as High priority.

keeping the information of the last 10 builds won't be bad. Any reason why you hard-cap it at 3?

keeping the information of the last 10 builds won't be bad. Any reason why you hard-cap it at 3?

The number itself is just arbitrary, we can adapt as we see fit. I prefer starting with a low number because is the one that will use less resources and I don't see much benefit on having old failed builds around (if you can't reproduce it, whichever error it had does not matter anymore).

I'm ok starting with 10 too.

The is a distinction between failed and non-failed ones though, that should be clear, as the non-failed ones have the parameters that worked the last time, and the only place for users to get the url to the currently working image for now. And we might want to add a feature to rebuild the latest build that passed (instead of just the latest).

@dcaro instead of doing this cleanup via builds-api, what do you think about doing this on buildservice itself? there is this feature in tekton that allows for doing something similar using cron https://github.com/tektoncd/experimental/tree/main/pipeline/cleanup
only thing we need to do is modify the bash snippet that does the actual cleanup so it can keep the desired builds while deleting the rest

@dcaro instead of doing this cleanup via builds-api, what do you think about doing this on buildservice itself? there is this feature in tekton that allows for doing something similar using cron https://github.com/tektoncd/experimental/tree/main/pipeline/cleanup
only thing we need to do is modify the bash snippet that does the actual cleanup so it can keep the desired builds while deleting the rest

There's several advantages I see to using the builds-api:

  • The cleanup happens when it's needed (when a user tries to create a new build)
  • There's no time where the user can still create as many builds as they want (with the cron above they have until the next cron run to DoS the system)
  • Easier to test, and enabling it to be unit-tested.
  • No thundering herd problems common to cronjobs.
  • Consolidate the logic in builds-api, it's easier to manage, deploy, test, review, ... there's less moving pieces.

And in general, unless there's a strong use-case for asynchronous jobs instead of doing it synchronous, keeping things synchronous makes them easier to handle.

The advantages I see on the cron side:

  • Maybe avoid coupling tekton on builds-api, though given that there's no easy way to avoid the coupling (we have to couple with it to start, cancel, list and show builds, and will need it for quotas and such too), it seems a bit moot.
  • Ease, as the cron is already there, though I would not be so rush to consider it just a copy-paste, as it introduces a new image (that needs copying to the tools/toolsbeta repo + maintenance), and the task is +3 years old, so might need some updating (as it uses the api with curl directly, the params are hardcoded there).
  • Makes the POST /build endpoint a bit faster (as it does not have to check and cleanup old builds), my expectation is in the ranges of tenths of seconds or less (it has to query only latest versions of pipelineruns/taskruns in a single namespace).

So imo the API version will be better in the mid/long term for sure, and it's not clear the cron has more benefits short-term, I think the API solution is the way to go.

Did I miss anything?

Raymond_Ndibe changed the task status from Open to In Progress.Jun 7 2023, 5:12 AM

Deployed \o/ contained in version 0.1.0 of the builds-api