Page MenuHomePhabricator

Decision request - How to manage quotas in Toolforge Build Service
Closed, ResolvedPublic

Description

Problem

The build service will enable every user to trigger a build, not having per-user quotas might end up in a situation where one user creates a denial of service for the rest.

Currently the build service will enforce quotas at the namespace level, but not at the user level.

Original design with one namespace and rationale: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Buildpack_Implementation#Namespacing

Note that the scope of this decision is for the first iteration of the project, for the long term there's other setups being thought out that will be decided later (after having feedback from the beta).

Constraints and risks

  • The risks of not doing anything is allowing one user to block the builds for the rest.

Decision record

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_How_to_manage_quotas_in_Toolforge_Build_service

Direct responsible individual

@dcaro

Options

Option 1

Keep only the shared namespace quota.

Pros:

  • No extra work needed.

Cons:

  • Does not avoid a user from affecting the others.

Option 2

Create one namespace per-user/tool

Pros:

  • Each user then has it's own namespace, with their own quotas.

Cons:

  • Not sure if it's possible with the current solutions (tekton), or how many modifications will it require
  • Will add extra maintenance costs, as those namespaces will have to be updated every time we update the system (pipelines for now)
    • This will require changing mainain_kubeusers to create those namespaces and quotas and such
    • This also complicates development/testing

Option 3

Re-use user namespace

Pros:

  • Each user then has it's own namespace, with their own quotas.

Cons:

  • Not sure if it's possible with the current solutions (tekton), or how many modifications will it require
  • Will add extra maintenance costs, as those namespaces will have to be updated every time we update the system (pipelines for now)
  • This allows users to control everything on the pipeline, making it impossible to put any secrets (ex. robot accounts to push to the repos) that are not user-owned.
    • That means changing the whole security model, and introducing quotas/tenants on harbor side.

Option 4

Wrap it on a webservice.
Instead of relying on kubernetes limiting tools that are bound to a workspace, and a resource, build our own admission controller (or similar) to limit (also, and first) on higher abstractions like 'build request'. This is something that tekton does not implement yet, but there's talks about implementing it there (in a more generic and nuanced way than we need).

Pros:

  • Fine grained control of quotas/limits
  • Flexibility to change it in any direction we need in the future
  • Might be part of the next step for the toolforge build serivce (TBD, but mainly moving from smart client to API based service, like the jobs framework)

Cons:

  • Some extra code to write, maintain and deploy.
  • Will take longer to implement (maybe better as a second step)

Event Timeline

dcaro renamed this task from Decision request template - How to manage quotas in Toolforge Build Service to Decision request - How to manage quotas in Toolforge Build Service.Mar 17 2022, 11:09 AM

Question, I don't believe I understand the extra maintenance cost of updating a system. How would multiple namespaces take more time to update than one? Perhaps I'm thinking of the wrong kind of update, are we talking about a k8s upgrade?

Question, I don't believe I understand the extra maintenance cost of updating a system. How would multiple namespaces take more time to update than one? Perhaps I'm thinking of the wrong kind of update, are we talking about a k8s upgrade?

I'm thinking on any changes to the structure of the Task/Pipeline/rbacs (any resource defined in the user's namespace) on the toolforge build service side, those will need to be updated on all namespaces, so will need some integration with the ldap infra (either through custom script, be that maintain_kubeusers, or another deploy script).

I believe implementation of option 2 is just this patch:

diff --git a/maintain_kubeusers/k8s_api.py b/maintain_kubeusers/k8s_api.py
index ab9c43a..64d9a9b 100644
--- a/maintain_kubeusers/k8s_api.py
+++ b/maintain_kubeusers/k8s_api.py
@@ -259,6 +259,9 @@ class K8sAPI:
                         "count/cronjobs.batch": "50",
                         "count/jobs.batch": "15",
                         "count/deployments.apps": "3",
+                        "count/tasks.tekton.dev": "1",
+                        "count/taskruns.tekton.dev": "1",
+                        "count/pipelineruns.tekton.dev": "1",
                     }
                 ),
             ),

But we need to double check object names and such.

By default tekton listen for events on all namespaces, so no changes required on the tekton side.

Several things can happen when users create the pipelinerun resource (ie, something like this file) which is the thing actually creating a load on the cluster

  1. it will be limited to whatever is in the quota, in this example patch 1. --- meaning, only one build at a time, which is exactly what we're looking for.
  2. if the pipelineRef can reference a pipeline defined on a different namespace (likely), then the cluster admins (us) create a shared pipeline resource (ie, this file), deployed along with tekton itself.
  3. if pipelineRef can only reference a pipeline defined on the same namespace as the pipelinerun, then the user should also define the pipeline itself.

In both cases (2) and (3) it should not matter a lot, because we will hide all this in the command line interface anyway, no?

Please, take into account that I'm not familiar with tekton, and I may have been assuming a lot of stuff. This could be all very imaginative speculation on my side.

That is not enough, we can't users have full control of tasks on their own namespace, as that would give the possibility to have control of containers running as root.
There's also the issue with pushing to the repo, as users should not have access to that info but if they can control the tasks they will be able to get that too.

Creating a separate namespace for each user where they don't have that control (that is, replicating the current image-build namespace one per user) gives that control but then you have to also maintain it, including adding the creation and updating/maintenance of the resources defined there to maintain_kubeusers or some other script.

ps. added the link to the original design so people can read the rationales there

That is not enough, we can't users have full control of tasks on their own namespace, as that would give the possibility to have control of containers running as root.
There's also the issue with pushing to the repo, as users should not have access to that info but if they can control the tasks they will be able to get that too.

I was unable to find any relevant documentation on the behavior you are describing. Do you have some pointers?

In particular, I'm curious to know which mechanism is responsible for making a difference between namespaces, ie why:

  • user requests creation of a pipelinerun -> it gets created on a central namespace -> whatever workload is created in a central namespace

is any different from:

  • user request creation of a pipelinerun -> it gets created on its own namespace -> whatever workload is created on its own namespace

If the namespace barrier is simply based on RBAC/PSP, then I don't understand what prevents us from having the same barrier on a per-namespace basis (like we already have for other stuff).

Anyway we already have https://github.com/toolforge/buildpack-admission-controller which as of this writing only validates PipelineRun objects, but it can be easily extended to verify whatever tekton-related object the user might be creating, greatly reducing (or simply removing) users ability to inject arbitrary stuff into a container that will be executed later by tekton.

At this point also worth clarifying what is the entry point here. The code in https://gerrit.wikimedia.org/r/c/cloud/toolforge/toolforge-cli/+/764327 seems to suggest the entry point is simply a PipelineRun object. I.e, a pipelinerun object is the only thing under user control, thus the only thing requiring a quota and specific security controls. Is that right?

That is not enough, we can't users have full control of tasks on their own namespace, as that would give the possibility to have control of containers running as root.
There's also the issue with pushing to the repo, as users should not have access to that info but if they can control the tasks they will be able to get that too.

I was unable to find any relevant documentation on the behavior you are describing. Do you have some pointers?

In particular, I'm curious to know which mechanism is responsible for making a difference between namespaces, ie why:

I believe the issue is applying PodSecurityPolicies (or a future replacement of them) correctly, since the build process needs a different PSP than normal webservices or running a buildpack-based image does.

In T304060#7788313, @Majavah wrote:

That is not enough, we can't users have full control of tasks on their own namespace, as that would give the possibility to have control of containers running as root.
There's also the issue with pushing to the repo, as users should not have access to that info but if they can control the tasks they will be able to get that too.

I was unable to find any relevant documentation on the behavior you are describing. Do you have some pointers?

In particular, I'm curious to know which mechanism is responsible for making a difference between namespaces, ie why:

I believe the issue is applying PodSecurityPolicies (or a future replacement of them) correctly, since the build process needs a different PSP than normal webservices or running a buildpack-based image does.

Yep, exactly, that's why I say that if we go for a non-common namespace, we need a new one (that is, one new one per tool/user), the notes on that are:
https://wikitech.wikimedia.org/wiki/File:Toolforge-build-service2.png

From the handover.

As long as the users can have full control of the pods/deployment in their namespace, they can do whatever they want with the pods/deployments that the pipeline generates there.

The process that the runs take is also outlined there (user creates pipelinerun, then the controller picks it up, starts spawning pods for each task, etc.).

Anyway we already have https://github.com/toolforge/buildpack-admission-controller which as of this writing only validates PipelineRun objects, but it can be easily extended to verify whatever tekton-related object the user might be creating, greatly reducing (or simply removing) users ability to inject arbitrary stuff into a container that will be executed later by tekton.

The problem is not the validation, is the full control of the pods generated by the pipeline controller.

At this point also worth clarifying what is the entry point here. The code in https://gerrit.wikimedia.org/r/c/cloud/toolforge/toolforge-cli/+/764327 seems to suggest the entry point is simply a PipelineRun object. I.e, a pipelinerun object is the only thing under user control, thus the only thing requiring a quota and specific security controls. Is that right?

To some extent yes, there's the discussion on how many tasks a pipeline runs, and how many pods they spawn and such, but as long as we retain control of the task and pipeline definitions limiting the amount of not finished pipeline runs is enough.

And that's another point, the pipeline runs don't go away when finished, they stay, I have to double check if tekton rotates them (as in only keeps the N last ones), but a simple "a user can only create N pipelineruns" might not be enough.

In T304060#7788313, @Majavah wrote:

I believe the issue is applying PodSecurityPolicies (or a future replacement of them) correctly, since the build process needs a different PSP than normal webservices or running a buildpack-based image does.

Yep, exactly, that's why I say that if we go for a non-common namespace, we need a new one (that is, one new one per tool/user), the notes on that are:
https://wikitech.wikimedia.org/wiki/File:Toolforge-build-service2.png

From the handover.

As long as the users can have full control of the pods/deployment in their namespace, they can do whatever they want with the pods/deployments that the pipeline generates there.

The process that the runs take is also outlined there (user creates pipelinerun, then the controller picks it up, starts spawning pods for each task, etc.).

Wait. The PipelineRun object (entry point created by the user) can live on a different namespace than the image build process. There is an explicit mention to this in the handover diagram:

image.png (252×952 px, 37 KB)

The picture I have in my head right now is:

  • The user creates a PipelineRun object in their own namespace.
  • The tekton controller (running in a central namespace) reacts to it.
  • The pipeline is triggered (defined by us, not by the user). The pipeline references a Task (defined by us, not by the user)
  • Both the pipeline and the task don't run in a user-controller namespace, but a centrally defined, admin-controlled namespace (image-build?). Some pods are created for each step.
  • the pipeline finishes, the container image is uploaded to the registry, task pods are deleted, etc.

How is the user able to inject anything in the resulting pods from the task?

In T304060#7788313, @Majavah wrote:

I believe the issue is applying PodSecurityPolicies (or a future replacement of them) correctly, since the build process needs a different PSP than normal webservices or running a buildpack-based image does.

Yep, exactly, that's why I say that if we go for a non-common namespace, we need a new one (that is, one new one per tool/user), the notes on that are:
https://wikitech.wikimedia.org/wiki/File:Toolforge-build-service2.png

From the handover.

As long as the users can have full control of the pods/deployment in their namespace, they can do whatever they want with the pods/deployments that the pipeline generates there.

The process that the runs take is also outlined there (user creates pipelinerun, then the controller picks it up, starts spawning pods for each task, etc.).

Wait. The PipelineRun object (entry point created by the user) can live on a different namespace than the image build process. There is an explicit mention to this in the handover diagram:

image.png (252×952 px, 37 KB)

Not, that picture explicitly says that the PipelineRun and the build process are running in the same namespace, there's a questionmark because when asking Brooke, she was not able to confirm if that was a must or not.

The picture I have in my head right now is:

  • The user creates a PipelineRun object in their own namespace.
  • The tekton controller (running in a central namespace) reacts to it.
  • The pipeline is triggered (defined by us, not by the user). The pipeline references a Task (defined by us, not by the user)
  • Both the pipeline and the task don't run in a user-controller namespace, but a centrally defined, admin-controlled namespace (image-build?). Some pods are created for each step.

There's 4 resources in play here:

  • Pipeline
  • Task
  • PipelineRun
  • TaskRun

the last two are in the same namespace and the pods are created in that same namespace too (at least by default, would have to check if there's a way to avoid it).

So if the user creates the PipelineRun on their own namespace, the TaskRun and the pods go there too. -> security issue

If on the other hand, we are able to configure it so the TaskRun and the pods are created on a different namespace than the PipelineRun (that's the questionmark in the diagram), then we still have the issue that we can't fine-grain the concurrency, as any limits apply to the shared namespace (as the limits have to go on the number/resources for Pods, not number of PipelineRuns as these can be running or not, and stay after finished). -> non-granular quota

So we have the same issue as allowing the user to create the PipelineRun on the shared namespace (non-granular quota).

The only way I see for fine-grained quotas is creating one namespace per-user/tool and setting the limits there (where the user has limited permissions), or having a validator for PipelineRuns that checks how many pipelines that user is currently running (as in they are running, not just exist).

I think that the latter might be a more elegant option, though it does not really limit the resources used, just the parallel runs per-user, so might not solve the ultimate problem of avoiding resource abuse.

So as of now, duplicating the namespaces with resources restrictions and fine grained access seems to be the only one that effectively avoids resource abuse.

There's a way to reuse a task cluster-wide (so no need to copy on each namespace):
https://tekton.dev/docs/pipelines/tasks/#task-vs-clustertask

I'm nit finding any clarity if it's possible to specify the target namespace to create the TaskRuns in from the Pipeline/PipelineRun or Task specs, or in the docs themselves, will spend some time checking closer, but as there's no way to specify them in those resources, I suspect it's not possible to have them is different namespaces.

Also, for later, I saw several reports of tekton lacking a feature to cleanup old pipeline runs. Some folks upstream pointed to hand-crafted solutions like running a cronjob to purge completed runs every X period of time, some similar to this: https://gist.github.com/AnyISalIn/9d22430d42fac100442b4b3ef0e0451f

Interestingly this was missing in kubernetes too for other resources like jobs themselves, until very recently that they introduced the TTLAfterFinished mechanism.

We decided to go with option 1 for now, gather feedback from the Beta and then reconsider our options.