What: My CI pipeline should only succeed after Toolforge has completed the requested deployment. If Toolforge ran into some problem, I would like my CI pipeline to fail.
Problem: Currently, the API reports success already upon accepting a deployment request into its internal queue. For example, this pipeline run succeeded immediately after Toolforge received the deployment request. However, at this point in time, Toolforge had not actually deployed anything yet. This makes the GitLab pipeline status rather misleading. Currently, even if there is a nice green checkmark in GitLab/GitHub,/etc., the Toolforge deployment still might have failed.
Proposal: In the Toolforge Components API server, expose an API that allows clients to poll the status of their deployment. The HTTP status code of this request would be one of the following:
- 200 OK — Toolforge has completely finished the deployment. The container was built successfully, it was successfully uploaded to the container registry, all jobs and webservices have successfully been re-started, and any configured health checks have passed at least once.
- 429 Too Many Requests (or some other status code that curl recognizes as retryable) — Toolforge is still working on the deployment.
- another HTTP error (which curl does not consider retryable) – Toolforge deployment has failed for some reason.
Given this, users could use curl --retry-max-time 3600 to wait for up to 1 hour until the deployment has either finished or failed, using the exponential back-off built into curl.
Optimization: To reduce polling traffic and make CI pipelines finish faster, let the server wait until either deployment status is final, or some timeout (perhaps 3 minutes or so) has expired. Of course, this is only realistic if you've implemented the server in a language/framework that can handle many parallel long-standing requests; I wouldn't know if that is the case. But this is just an optimization.