Page MenuHomePhabricator

Decision request - How to provide a way to install system dependencies for buildpack-based images
Closed, ResolvedPublic

Description

Problem

Sometimes tools need some system dependencies like:

  • imagemagik
  • pstools

And currently buildpacks don't allow to install any system packages.

Constraints and risks

  • Users will not be able to use the buildservice and request adding extra packages to the existing toolforge docker images
  • Users might not be able to migrate out of the grid without a big refactor

Decision record

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_request_-_How_to_provide_a_way_to_install_system_dependencies_for_buildpack-based_images

Options

Option 1

Allowing to install apt packages using upstream apt buildpack (https://github.com/heroku/heroku-buildpack-apt).

This can be done by injecting that buildpack to the generated groups.toml (after detection).

We should put this feature behind an access list, enabled for certain projects after request.

Pros:

  • Unblocks any users that need extra dependencies
  • Still pushes users to use the recommended way to pull dependencies with buildpacks (pip/composer/bundler/...)

Cons:

  • For users that need it, they will have to do an extra request to enable it
  • A bit more complicated code-wise (we have to implement some sort of allowlist, we might use it for other things too ex. multistack/custom buildpacks)

Option 2

Allowing to install apt packages using upstream apt buildpack (https://github.com/heroku/heroku-buildpack-apt).

This can be done by injecting that buildpack to the generated groups.toml (after detection).

Enabled for everyone.

Pros:

  • Unblocks any users that need extra dependencies
  • Users that need it have it right away
  • No need for allowlist implementation

Cons:

  • We enable installing any package from anywhere to everyone potentially welcoming non-opensource code to run on toolforge
  • Images will be bigger (we currently have 1G limit per tool set on harbor, so will not be bigger than that).

Option 3

Allowing only selected buildpacks for specific libraries, and not the "apt" buildpack.

E.g. this buildpack adds a bunch of additional libraries https://github.com/heroku/heroku-geo-buildpack

Many buildpacks can be found online, and we could create more ourselves.

Pros:

  • More control on what people can install in their images
  • Might be solved the same way that multistack/specific buildpacks would be

Cons:

  • We would need to add those buildpacks individually, as people request them
  • Some libraries might not be available as a buildpack, and creating a custom buildpack is possible but not easy

Option N

Add your options here!

Event Timeline

dcaro renamed this task from [tbs] Find out a solution for system-level library dependencies to Decision request - How to provide a way to install system dependencies for buildpack-based images.May 24 2023, 1:26 PM
dcaro updated the task description. (Show Details)

I would currently lean heavily towards option 2 as the preferred solution. We have several mechanisms to disable accounts which are in violation of Cloud VPS and Toolforge policies (re installation of non-free software). Image size can be manipulated in a number of ways and should be protected by quota rather than manual review.

These cons in option 2 would also apply to option 1 would they not? It is not clear to me how adding an allow list for which tools could use apt mitigates what can be done via apt afterwards.

Of the options listed, I agree with going with option 2. I would be curious to hear about any other concerns with enabling it unrestricted for anyone using buildpacks. I don't feel the mentioned issue of potential TOU violations warrants the extra work required for option 1. Any other concerns beyond image size?

I would currently lean heavily towards option 2 as the preferred solution. We have several mechanisms to disable accounts which are in violation of Cloud VPS and Toolforge policies (re installation of non-free software).

Unfortunately all of them require time and effort on our part, so it's free of cost for us to enable more ways to misbehave that we will have to monitor and track.

Image size can be manipulated in a number of ways and should be protected by quota rather than manual review.

And we do \o/

These cons in option 2 would also apply to option 1 would they not? It is not clear to me how adding an allow list for which tools could use apt mitigates what can be done via apt afterwards.

Making it hard to do the wrong thing is one of the main deterrents we have for wrong behavior (ex. quota bumps, toolforge account open requests, adding packages to any of our existing images, ...)

I would be curious to hear about any other concerns with enabling it unrestricted for anyone using buildpacks. I don't feel the mentioned issue of potential TOU violations warrants the extra work required for option 1. Any other concerns beyond image size?

Not really, image size is not the most concern there either, it's the TOU violation.
I my mind that was the main reason why we so zealously control what goes in the toolforge images currently, and the only reason to have *public* git repos as the only way to put code in toolforge, but I might not be reading those decisions in the same way as others.

If that is not something we want to care much about, we might want to reconsider BYOC, as that was the main risk listed (though the rationale is not very clear on how much it influenced the decision).

In any case, I'm ok enabling this without restrictions and forcing restrictions later if needed, so good to go for option 2 👍, just wanted to make sure my concerns are registered for future decisions.

I would like to hear more about the weight the enforcement of open source on toolforge, so feel free to reply and we can continue discussing :), but I'll close the decision.

I would like to hear more about the weight the enforcement of open source on toolforge, so feel free to reply and we can continue discussing :), but I'll close the decision.

Oh no, I will not close this decision sorry xd, got confused, this one has been open just for a bit, I'll leave open so we can continue discussing and anyone can add more ideas. Apologies.

I'm still ok going for option 2.

I think I'll try to play a bit trying to install imagemagik for example from buildpacks, see how it works, maybe try a couple more, if we get a list of "good" buildpacks for the dependencies we want might make option 3 a nicer option.

+1 for Option 2 here as well. I don't think we should be policing TOUs before the fact, i.e. if someone breaks them let's address that, but I'm not in favor of preemptively bending over backwards to prevent abuse. But then as @dcaro mentioned, why not BYOC?

As for Option 3, I don't think a curated list of buildpacks would ever satisfy everyone's needs, and it would add an extra burden on us.

Option 2 seems like the right call to me. I'm curious about the non-free concern... would we be limiting install to particular repos, or would users also be able to inject non-free repos before installing packages?

The buildpack allows to inject repos too, so from anywhere (we could try to add some filtering on top of it to allow only default repos, or a subset of repos, should not be very hard)

dcaro changed the task status from Open to In Progress.Jun 2 2023, 7:34 AM

Seeing that there's no new comments, I will decide on this by next Thursday (08/06/2023), if new comments with new ideas/proposals arise I'll set up a meeting for the week after.

I think option 2 seems the most nice to me in the context of migration from Toolforge grid to kubernetes, i.e, seeking a smooth transition for those old tools.

I'm fine with option 2 but I think "limiting install to particular repos" (as suggested by @Andrew above) is something we should consider.

I have done a POC implementation that only allows the system repos packages (and had to use a different buildpack though, as the one from heroku there is quite outdated), you can find the code here:

https://gitlab.wikimedia.org/repos/cloud/toolforge/buildservice/-/merge_requests/4

Reviews are welcome

Okok, I'll declare this decision taken :)

So Option 2 limiting the packages only to system repositories (the attached merge request), I'll move to a wiki next. Thanks!

Mentioned in SAL (#wikimedia-cloud) [2023-06-20T11:59:48Z] <dcaro> deploy buildservice with aptfile support (T336669)