Page MenuHomePhabricator

[components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted
Closed, ResolvedPublic

Description

This is because the image name and tag is the same, so jobs-api considers the job to be the same though the actual contents of the image did change.

This might be solved partially the same way as T389043: [builds-api] Store the commit hash that was used for the build, though a first simple approach is to restart all continuous jobs (scheduled ones will just pull the new image on the next run)

Event Timeline

dcaro triaged this task as High priority.Mar 18 2025, 10:00 AM

Now that we have the resolved_ref property returned by the builds-api for each build, we can compare the commit of the last build with the commit from the new build and decide if it's the same or not.

@dcaro this seems like something that should be done on the jobs-api side. What's your view on how we should approach this?
I think since the jobs api already compares jobs and decides if it should recreate or not, it should also be able to do deeper level comparison by checking if the there is a change in the image (using the resolved_ref thing. wdyt?
Would you rather we do it on the components api side?

@dcaro this seems like something that should be done on the jobs-api side. What's your view on how we should approach this?
I think since the jobs api already compares jobs and decides if it should recreate or not, it should also be able to do deeper level comparison by checking if the there is a change in the image (using the resolved_ref thing. wdyt?
Would you rather we do it on the components api side?

The jobs-api does not have access to the resolved_ref property of the build, it just knows about the image name.

The problem comes also whenever the image changes, but the commit does now (ex. when it's rebuilt later from the same commit, maybe pulling in different dependencies).

I propose the following behavior of the deployment flow on the components-api:

  • Adding two new boolean parameters to the deploy edpoint, force-rebuild and force-restart
  • If both are true:
    • Always build the images
    • Always restart the jobs, even if they did not change
  • If force-rebuild is false (default):
    • if the resolved_ref of the last build for the component in the previous deployment is the same as the current hash -> skip build for that component
    • if there's no previous deployment, or the component did not exist in the previous deployment, or the destination image does not exist anymore, or the build reference in the previous deployment does not exist anymore, rebuild
  • If force-restart is false (default):
    • if the build did not change, don't restart the job
  • If force-restart is true
    • if the job spec is the same (so jobs-api did not restart it), do an explicit restart of the job

If/when we start exposing information about available runtimes on the builds-api (ex. to list the pre-built images that are available, and the bulidservice images available for the tool), then we can add the information of the commit hash the buildservice one is built from too, so the jobs-api can also retain that information on the job datastructure itself (once we store it), and we can compare more in detail which image was used for that job, and if the new job would change that.

Other options are to add to the builds-api a way to expose/configure the tags an image has, so we can use that with the commit hash as the image name, so the newer image will have a different commit hash, though that has to be thought a bit more clearly.

This might also be influenced by whichever way we do the rollbacks. For now, I'd start with the above flow, that is doable and does not prevent any of the future solutions.

dcaro changed the task status from Open to In Progress.May 20 2025, 9:38 AM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 20) board.

maybe we should add this "force" field to the config? that way it's possible to pick and choose the components we want to re-build/re-start

if instead we add this to the cli, then we should also add a way to deploy decide the components to deploy

maybe we should add this "force" field to the config? that way it's possible to pick and choose the components we want to re-build/re-start

if instead we add this to the cli, then we should also add a way to deploy decide the components to deploy

We are intentionally not supporting per-component actions yet, we will deal with that later, for now making it simple and deploying everything is more than enough.

Note also that you deploy without passing any config, so it has to be there even if we do per-component deploy eventually.

It should be a parameter of the POST /deployment endpoint, that is component agnostic.

Once we start supporting per-component deploys, then we'll extend whatever method we use to deploy specific components to support that option (be that a path variable like POST /components/<component_name>/deploy, or something else).

Feel free to open a task for the per-component deployment and start adding ideas/notes on what's needed and how things could work. But for this task, let's keep it simple, and focus on what we support now.

maybe we should add this "force" field to the config? that way it's possible to pick and choose the components we want to re-build/re-start

if instead we add this to the cli, then we should also add a way to deploy decide the components to deploy

We are intentionally not supporting per-component actions yet, we will deal with that later, for now making it simple and deploying everything is more than enough.

Note also that you deploy without passing any config, so it has to be there even if we do per-component deploy eventually.

It should be a parameter of the POST /deployment endpoint, that is component agnostic.

Once we start supporting per-component deploys, then we'll extend whatever method we use to deploy specific components to support that option (be that a path variable like POST /components/<component_name>/deploy, or something else).

Feel free to open a task for the per-component deployment and start adding ideas/notes on what's needed and how things could work. But for this task, let's keep it simple, and focus on what we support now.

What I was trying to suggest was adding the force field to the config instead of passing it as a parameter to the deploy endpoint. It is not exactly clear what option you prefer reading your comment above. I am also aware the config is not separate thing and not provided during the deploy. What I suggest is a config that looks like this:

build:
    ...
    force: True
deploy:
    command: <whatever command>
    force: True

Instead of passing it as part of the deploy endpoint

Hey, if you put it in the config, there's no way to control per-deployment (you'd have to reconfigure, then create deploy, then reconfigure to restore), I propose passing it as a parameter of the deploy create endpoint, so you can do things like "soft-deploy" every hour to pull code changes, and force deploy every week to rebuild with the latest dependencies (as an example):

It should be a parameter of the POST /deployment endpoint, that is component agnostic.

raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/33

[components.deployment.create] add force-build and force-restart option

raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797

Draft: [components-api] add components-api conditional build and run tests

Change #1155205 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/wmcs-cookbooks@main] toolforge.component.deploy: run only component tests by default

https://gerrit.wikimedia.org/r/1155205

Change #1155205 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] toolforge.component.deploy: run only component tests by default

https://gerrit.wikimedia.org/r/1155205

group_203_bot_f4d95069bb2675e4ce1fff090c1c1620 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/811

components-api: bump to 0.0.115-20250611123240-830d7be5

group_203_bot_f4d95069bb2675e4ce1fff090c1c1620 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/814

components-api: bump to 0.0.116-20250612105200-81744f77

group_203_bot_f4d95069bb2675e4ce1fff090c1c1620 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/817

components-api: bump to 0.0.119-20250617142601-78c0f80f

dcaro moved this task from In Review to Done on the Toolforge (Toolforge iteration 21) board.