Page MenuHomePhabricator

[components-api,beta] Image should only be build once when re-used in components
Open, HighPublic

Description

Bug

Tool: cluebotng-review
What are you trying to do?

When trying to migrate the existing jobs into components, the build fails due to the max build limit.

The tool uses 5 different images (3 relevant to the component config as neither redis nor bookworm are built from packs).

What does happen?

The deployment fails as it tries to execute too many concurrent builds (this probably should handle re-trying on a 409, but that is another issue).

tools.cluebotng-review@tools-bastion-12:~$ toolforge components deployment show
Warning: You are using a beta feature of Toolforge.
Deployment ID: 20250813-171419-gv95e5clr4
Created: 20250813-171419
Status: failed
Long status: 
  Got exception: Some builds failed to start: export-statistics(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) grafana-alloy(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) grant-review-access-from-wikipedia-rights(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) import-training-data(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) irc-relay(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) mark-edits-as-deleted(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) mark-edits-as-having-data(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds) update-edit-classifications(error:409 Client Error: Conflict for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/builds/v1/tool/cluebotng-review/builds)

Builds:
  add-dangling-edits-to-group(pending): id:cluebotng-review-buildpacks-pipelinerun-xm5v4 Not started yet
  add-edits-to-queue(pending): id:cluebotng-review-buildpacks-pipelinerun-hf42t Not started yet
  add-reported-edits(pending): id:cluebotng-review-buildpacks-pipelinerun-99b64 Not started yet
  add-reviews-from-huggle(pending): id:cluebotng-review-buildpacks-pipelinerun-4hn7k Not started yet
  add-reviews-from-report(skipped): id:cluebotng-review-buildpacks-pipelinerun-wmd5b Reusing existing build
  celery-worker(skipped): id:cluebotng-review-buildpacks-pipelinerun-xsj96 Reusing existing build
  cleanup-user-records(skipped): id:cluebotng-review-buildpacks-pipelinerun-b98vx Reusing existing build
  cluebotng-reviewer(skipped): id:cluebotng-review-buildpacks-pipelinerun-zdbbh Reusing existing build
  export-statistics(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  grafana-alloy(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  grant-review-access-from-wikipedia-rights(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  import-training-data(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  irc-relay(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  mark-edits-as-deleted(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  mark-edits-as-having-data(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish
  update-edit-classifications(failed): id:no-id-yet Got too many builds running (4 out of 4 max), cancel some or wait for them to finish

Runs:
  add-dangling-edits-to-group(skipped): Skipped due to previous failure
  add-edits-to-queue(skipped): Skipped due to previous failure
  add-reported-edits(skipped): Skipped due to previous failure
  add-reviews-from-huggle(skipped): Skipped due to previous failure
  add-reviews-from-report(skipped): Skipped due to previous failure
  celery-worker(skipped): Skipped due to previous failure
  cleanup-user-records(skipped): Skipped due to previous failure
  cluebotng-reviewer(skipped): Skipped due to previous failure
  export-statistics(skipped): Skipped due to previous failure
  grafana-alloy(skipped): Skipped due to previous failure
  grant-review-access-from-wikipedia-rights(skipped): Skipped due to previous failure
  import-training-data(skipped): Skipped due to previous failure
  irc-relay(skipped): Skipped due to previous failure
  mark-edits-as-deleted(skipped): Skipped due to previous failure
  mark-edits-as-having-data(skipped): Skipped due to previous failure
  update-edit-classifications(skipped): Skipped due to previous failure

14 of these jobs/runs/components share the same image/ref, thus the image only needs to be built once (actually it will be overwritten multiple times, which may or may not upset harbour).

With the configuration (https://github.com/cluebotng/component-configs/blob/main/cluebotng-review.yaml) I would expect 3 images to be built:

And then the 16 'runs' applied.

For now I've reverted to building the 3 images with fabric and deploying via jobs.yaml (https://github.com/cluebotng/reviewer/blob/main/fabfile.py#L58).

Event Timeline

This is sort of related to T401388 in the context of the comment regarding which component <> ref; I'm actually not sure what would happen if I set different refs for the same repo here... I assume the last build wins and updates the latest tag (it doesn't matter in my use case, but that could be unexpectedly interesting).

I'm actually not sure what would happen if I set different refs for the same repo here... I assume the last build wins and updates the latest tag (it doesn't matter in my use case, but that could be unexpectedly interesting).

Currently each component uses a different image name, so they don't share images, if it's reusing an image, is for that same component in a previous build, you can see that when listing the builds:

tools.cluebotng-review@tools-bastion-13:~$ toolforge build list
build_id                                       status    start_time            end_time              source_url                                               ref      envvars    use_latest_versions    destination_image
cluebotng-review-buildpacks-pipelinerun-kj2rf  ok        2025-08-13T20:40:25Z  2025-08-13T20:41:28Z  https://github.com/cluebotng/reviewer.git                v0.1.10  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-dk66j  ok        2025-08-13T20:39:15Z  2025-08-13T20:40:20Z  https://github.com/cluebotng/external-grafana-alloy.git  v0.1.5   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/grafana-alloy:latest
cluebotng-review-buildpacks-pipelinerun-lgpjv  ok        2025-08-13T20:38:43Z  2025-08-13T20:39:11Z  https://github.com/cluebotng/irc_relay.git               v1.1.12  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/irc-relay:latest
cluebotng-review-buildpacks-pipelinerun-g8zbm  ok        2025-08-13T20:33:30Z  2025-08-13T20:34:38Z  https://github.com/cluebotng/reviewer.git                v0.1.10  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-4jvgl  ok        2025-08-13T20:32:26Z  2025-08-13T20:33:27Z  https://github.com/cluebotng/external-grafana-alloy.git  v0.1.4   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/grafana-alloy:latest
cluebotng-review-buildpacks-pipelinerun-c24f7  error     2025-08-13T17:18:14Z  2025-08-13T17:18:32Z  https://github.com/cluebotng/reviewer.git                v0.1.8   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-hqq5f  error     2025-08-13T17:17:53Z  2025-08-13T17:18:11Z  https://github.com/cluebotng/irc_relay.git               v1.1.12  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/irc-relay:latest    <- this is actually the component name

Two ideas come right away to me:

  • Allow reusing images from other components (this forces the components to use the same build, that is, repo+ref)
  • Allow queuing builds.

Should not be too hard to implement any of those, and both are in the "roadmap" (that with a user requesting it is the magic combo :) )

I'm actually not sure what would happen if I set different refs for the same repo here... I assume the last build wins and updates the latest tag (it doesn't matter in my use case, but that could be unexpectedly interesting).

Currently each component uses a different image name, so they don't share images, if it's reusing an image, is for that same component in a previous build, you can see that when listing the builds:

tools.cluebotng-review@tools-bastion-13:~$ toolforge build list
build_id                                       status    start_time            end_time              source_url                                               ref      envvars    use_latest_versions    destination_image
cluebotng-review-buildpacks-pipelinerun-kj2rf  ok        2025-08-13T20:40:25Z  2025-08-13T20:41:28Z  https://github.com/cluebotng/reviewer.git                v0.1.10  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-dk66j  ok        2025-08-13T20:39:15Z  2025-08-13T20:40:20Z  https://github.com/cluebotng/external-grafana-alloy.git  v0.1.5   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/grafana-alloy:latest
cluebotng-review-buildpacks-pipelinerun-lgpjv  ok        2025-08-13T20:38:43Z  2025-08-13T20:39:11Z  https://github.com/cluebotng/irc_relay.git               v1.1.12  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/irc-relay:latest
cluebotng-review-buildpacks-pipelinerun-g8zbm  ok        2025-08-13T20:33:30Z  2025-08-13T20:34:38Z  https://github.com/cluebotng/reviewer.git                v0.1.10  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-4jvgl  ok        2025-08-13T20:32:26Z  2025-08-13T20:33:27Z  https://github.com/cluebotng/external-grafana-alloy.git  v0.1.4   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/grafana-alloy:latest
cluebotng-review-buildpacks-pipelinerun-c24f7  error     2025-08-13T17:18:14Z  2025-08-13T17:18:32Z  https://github.com/cluebotng/reviewer.git                v0.1.8   N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/reviewer:latest
cluebotng-review-buildpacks-pipelinerun-hqq5f  error     2025-08-13T17:17:53Z  2025-08-13T17:18:11Z  https://github.com/cluebotng/irc_relay.git               v1.1.12  N/A        True                   tools-harbor.wmcloud.org/tool-cluebotng-review/irc-relay:latest    <- this is actually the component name

That makes sense - same as -i on the command line.

Two ideas come right away to me:

  • Allow reusing images from other components (this forces the components to use the same build, that is, repo+ref)
  • Allow queuing builds.

Should not be too hard to implement any of those, and both are in the "roadmap" (that with a user requesting it is the magic combo :) )

Agreed, either of those would resolve this. In this specific case 1 would be better for resource consumption and deploy time, but I can defiantly think of cases where you might hit 4 max builds and not be able to re-use the image so 2 makes sense also.

dcaro triaged this task as High priority.Aug 21 2025, 3:31 PM

A simpler option is also doing the queueing on the components-api side, that's probably easier too right now (and does not prevent the rest of solutions), I'll create a subtask for that.