Original document (Comment-only): https://docs.google.com/document/d/1uYjqGpfvb8Q27Nc0-xRZ-WrHNCnBGeA-c8_CKKpd5zU/edit?tab=t.0#heading=h.uhbe07clzg9n
DONE: Beta start email: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/5D7NK7Z7KMWQPWQC23453YB7FV555Q5R/
Toolforge push-to-deploy beta plan
Other docs:
- Components API MVP features
- https://phabricator.wikimedia.org/T194332 - [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build packs
- Design Proposal DRAFT Components API
- Hypothesis wording https://etherpad.wikimedia.org/p/6.3_new_hypothesis_wording
We want to gather feedback from users on the current direction and implementation for the push-to-deploy feature for toolforge.
Timeline (1 FTEs, better 2 FTEs partial time for reviews/pairing) - ~10 months plan
[now - 27th June 2025] Announce the upcoming beta
- Prepare the technical side of things
- Missing features/bugfixes
- T394273: [components-api] add tool config version check
- T386829: [components-api] Rename the CRDs groups to be `components-api.toolforge.org`
- T389044: [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted
- T362072: [components-api] Add support for port/healthcheck for continuous jobs in tool config/depolyment
- T394276: [components-api] Add basic prometheus metrics
- T395070: [components-api] add all the missing options for continuous components
- Show useful and actionable error messages instead of stack traces or just 'failed' (patch sent)
- When creating a configuration, show warning for config values that were not understood
- T394275: [components-api] Add alerts and runbooks for basic service health
- Deploy cli and api in tools
- Enable functional tests in tools
- Missing features/bugfixes
- Prepare the documentation side of things (howto, getting started, etc.)
- T394279: [components-api,components-cli] add user documentation page
- with a big note describing the beta process
- with an example of tool config and links on how to find all options
- T394280: [components-api] Add admin documentation page
- Links from toolforge user help page
- T394279: [components-api,components-cli] add user documentation page
- Prepare/define the feedback flows
[30th June - 31st October 2025] Start the beta
- Announce to the community including
- Documentation
- How to use the new features
- How to give feedback and how we are going to handle it
- Documentation
- Gather feedback from users
- Fix bugs that might arise
- [every 2 weeks] Check-point for direction/summarize feedback to date (2 weeks after start, 2 weeks cadence, at least 7 cycles - 3.75 months)
- Decide and implement a round of new features
- Iterate if needed
[1st November - 31st December 2025] Start stabilization phase (duration 4 months)
- Fix bugs + stabilization features
[5th Jan 2026] Release as stable
- Regular development cycle
Beta Scope and Limitations
For a detailed list of features see Components API MVP, the following is a summary of the main features.
The minimal scope will include all of:
- Only continuous components support
- Only buildservice based components support (build from source code)
- Trigger deploy with deploy-token and cli
- Single deployment (no queues)
An extended scope: some of these features might be included if there is time:
- T395065: [components-api] Add support for scheduled components
- T395071: [components-api] Add all missing options for scheduled components
- T395039: [components-api,components-cli] add `deploy cancel` feature
- T395077: [components-cli] bash autocomplete does not autocomplete file name when creating config
- T395076: [components-api] use the `build.params.image_name` to compare with the `component`
- Pre-built images
- Trigger deploy with repository polling
- Queue deployments
- T362077: [components-api] Add webservice support
Out of scope:
- Rollback support
- Use an external URL for the tool config yaml (instead of having to manually configure the first time)
- Component deployment ordering/dependency definition
Features notes
Minimal example of the supported tool config:
config-version: v0.1
components:
api:
description: A python-flask api
type: continuous
build:
repository: https://gitlab.wikimedia.org/toolforge-repos/mytool
ref: main
run:
port: 5000
command: api
celery-worker:
description: Celery worker for long-running tasks
type: continuous
build:
repository: https://gitlab.wikimedia.org/toolforge-repos/mytool
ref: main
run:
command: celery-workerDeployment Workflow
For the beta, the deployment process will look like something like this:
Basic CLI flow:
- User uploads their tool configuration toolforge components config create
- We should have a clear documentation on the structure of a config object
- Document how to import from a file or similar external source (git url, etc.)
- User triggers a deployment for their tool toolforge components deployment create
- System generates a unique deploy_id (e.g., datestamp-randint)
- System creates a ToolDeployment CRD instance with:
- The generated deploy_id
- The current timestamp as creation_time
- Initial status set to "PENDING"
- Empty builds object
- Empty runs object
- For each component in the ToolConfig:
- The system initiates a build process
- Adds an entry to the builds object in the ToolDeployment CRD with:
- A generated build_id
- Initial status set to "PENDING"
- System updates the ToolDeployment CRD status to "IN_PROGRESS"
- As each component build progresses:
- The system updates the corresponding build status in the ToolDeployment CRD
- When all the builds are complete, for each components the system runs it (ex. create continuous job)
- The system updates the corresponding run status in the ToolDeployment CRD
- When all runs are complete:
- The system sets the overall ToolDeployment status to "FINISHED" if all builds and runs succeeded, or "FAILED" if any build or run failed.
- User can check the deployment status (per-build, per-run, general) and details with toolforge components deployment show <deploy_id>
- User can see all the latest deployments, and if they failed or not, with toolforge components deployment list
Basic Automated/webhook flow:
- User uploads their tool configuration toolforge components config create
- User creates a deploy token toolforge components deploy-token create
- User configures their CI scripts/system to do a call to https://api.svc.beta.toolforge.org (see this for a full working example)
- When the user triggers the CI action (ex. On push to the repository main branch), a deployment is created, that follows the same process as the basic cli flow.