Page MenuHomePhabricator

Decision request - What to use for toolforge components api task execution
Open, HighPublic

Description

Problem

To be able to manage pipelines in the components api (build 1 + build 2 -> deploy component 1 -> deploy component 2 for example) we need some way to handle execution pipelines.

Constraints and risks

  • The pipelines don't need to be huge, dozens of components would be the biggest

Decision record

In progress

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T362224_What_to_use_for_toolforge_components_api_task_execution

Options

Option 1

No asynchronous task processing

Pros:

  • Easiest to implement

Cons:

  • If the request breaks, the whole pipeline fails

Option 2

After-request asynchronous task processing on the same thread (ex. https://fastapi.tiangolo.com/tutorial/background-tasks/)

Pros:

  • Easy to implement and setup (no extra components/services/etc. needed)

Cons:

  • If the service gets restarted (ex. OOM, moving to a different worker) the pipeline breaks

Option 3

Using tekton pipelines

Pros:

  • Already used for build service
  • Good pipeline support (made for it)

Cons:

  • Relatively complex to setup
  • Pipelines written in tekton yaml + custom image/shell script
  • Needs interfacing with tekton (similar to what we do with builds service)

Option 4

Using celery with redis

Pros:

  • Very common pattern
  • Good pipeline support (made for it)
  • Pipelines written in the same code than the service

Cons:

  • Needs a redis instance (we should not reuse the user-facing one for security reasons I think)

Option N

Add you option here!

Event Timeline

I vote for Option 1, I think we should aim for the easiest solution in the first MVP of the component API.

I could be wrong, but I don't imagine having async processing is a requirement for most tools, see also my comment in T362075. Adding async processing could become a feature request that can be prioritized based on how many users are interested.

Would #1 mean having to keep the single HTTP request alive for the duration of the entire build + deploy? I'd prefer to avoid anything requiring a single HTTP request to stay alive for multiple minutes (or longer) :/

I would also avoid that, but I think Option 1 doesn't necessarily entail a long HTTP request, it could be a trigger+poll mechanism for example. I'm not even sure if in the first MVP we need an endpoint that builds all the components at once, or if we could have a separate trigger per each component.

I would also avoid that, but I think Option 1 doesn't necessarily entail a long HTTP request, it could be a trigger+poll mechanism for example. I'm not even sure if in the first MVP we need an endpoint that builds all the components at once, or if we could have a separate trigger per each component.

I would mean one long HTTP request yes, that's what I meant with synchronous.

By asynchronous (option 2, 3, 4) I mean, as in with the http request, you start a background task (using the same python process - option 2, or using an asynchronous batch processing system - option 3 and 4) get an ID, and then with the polling you check on the status of the background task.

Option 1 for an MVP, then iterate on it as needed.

I would mean one long HTTP request yes, that's what I meant with synchronous.

I see, thanks for clarifying.

If we take the simple case of a tool with a single component that we need to build+deploy, is the idea that the component-api will call the builds-api, wait for the build to finish, then call the K8s API to deploy the new image?

There's indeed some complexity to keep track of the status of a given build. I don't like the idea of an API that takes minutes to respond to an HTTP call (not even in the MVP), so we need to store somewhere that a component-api operation is in progress, and be able to retrieve its status.

I'm starting to think that Option 3 (using Tekton) might make sense.

This is also related to T362069: [components-api] Get a skeleton of API webservice and implement `/tool/<toolname>/deploy` with build-only features where you discussed the ID+polling implementation for component-api.

I would mean one long HTTP request yes, that's what I meant with synchronous.

I see, thanks for clarifying.

If we take the simple case of a tool with a single component that we need to build+deploy, is the idea that the component-api will call the builds-api, wait for the build to finish, then call the K8s API to deploy the new image?

Yep (using the jobs-api probably too).

There's indeed some complexity to keep track of the status of a given build. I don't like the idea of an API that takes minutes to respond to an HTTP call (not even in the MVP), so we need to store somewhere that a component-api operation is in progress, and be able to retrieve its status.

I'm starting to think that Option 3 (using Tekton) might make sense.

This is also related to T362069: [components-api] Get a skeleton of API webservice and implement `/tool/<toolname>/deploy` with build-only features where you discussed the ID+polling implementation for component-api.

Yep, thanks for pointing it out, it's relevant. In that task as we only do a build, we don't really need the "pipeline" behavior (if the build passes then start the job), so we can do with a one-off kind of request at the beginning, this task might change that "give id then poll" mechanism, depending on what we choose.