Page MenuHomePhabricator

Create conda .deb and docker image
Closed, ResolvedPublic

Description

Data engineering is using conda-packed conda environments for job distribution in Hadoop, and we are trying to automate the generation of these conda env artifacts using gitlab CI.

gitlab CI runs in docker containers. We can use external docker images (for now) since the CI runs in Cloud VPS, but I'd prefer if we could use official images from docker-registry.wikimmedia.org.

I've tried a few ways of automating conda installs (here, and here), all of which are a little hacky.

So, I'd like to make or import a conda .deb into our apt repo, and then create a simple gerrit repo that uses our Deployment Pipeline to generate docker images with conda installed (via that .deb).

I could skip the .deb step, but I think that having a conda .deb will be nice for other reasons, like T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments.

Event Timeline

@MoritzMuehlenhoff advice? Can I import conda's official .deb into our apt repo, or would you prefer I create a totally new debian packaging for this? The debian packaging would probably just use the miniconda release archiva to create a miniconda env in a debian tree, and package that up, so it'd be pretty dumb. I'd prefer to use conda's .deb if that is okay with you.

@MoritzMuehlenhoff advice? Can I import conda's official .deb into our apt repo, or would you prefer I create a totally new debian packaging for this? The debian packaging would probably just use the miniconda release archiva to create a miniconda env in a debian tree, and package that up, so it'd be pretty dumb. I'd prefer to use conda's .deb if that is okay with you.

Yeah, let's simply use the official conda deb here (we can import it with reprepro to a thirdparty/conda component).

Change 774481 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Add thirdparty/conda component to reprepro updates

https://gerrit.wikimedia.org/r/774481

Change 774481 merged by Ottomata:

[operations/puppet@production] Add thirdparty/conda component to reprepro updates

https://gerrit.wikimedia.org/r/774481

Change 774508 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] reprepro updates - set Name: thirdparty/conda

https://gerrit.wikimedia.org/r/774508

Change 774508 merged by Ottomata:

[operations/puppet@production] reprepro updates - set Name: thirdparty/conda

https://gerrit.wikimedia.org/r/774508

Change 774517 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] aptrepo updates - set conda Suite to stable

https://gerrit.wikimedia.org/r/774517

Change 774517 merged by Ottomata:

[operations/puppet@production] aptrepo updates - set conda Suite to stable

https://gerrit.wikimedia.org/r/774517

Change 774522 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] aprepro updates - conda - set Components: main>thirdparty/conda

https://gerrit.wikimedia.org/r/774522

Change 774522 merged by Ottomata:

[operations/puppet@production] aprepro updates - conda - set Components: main>thirdparty/conda

https://gerrit.wikimedia.org/r/774522

Alright, I seem to have got reprepro to pull the update:
https://apt.wikimedia.org/wikimedia/pool/thirdparty/conda/c/conda/

And, now conda is listed in both buster-wikimedia and bullseye-wikimedia Packages:

cat /srv/wikimedia/dists/{buster,bullsye}-wikimedia/thirdparty/conda/binary-amd64/Packages

However, apt can't find the package on a node (after apt-get update), so I must be missing a step. @MoritzMuehlenhoff any ideas?

By default only "main" and "thirdparty/hwraid" (for baremetal hosts) are added to our servers. And that's by design, so that we have full control what we add to which Puppet role. I suppose the old conda package was uploaded to "main". But adding it is straightforward. If that doesn't work, let me know.

apt::package_from_component { 'conda':
   component => 'thirdparty/conda',
}

Or if you need more than just the "conda" deb (which is automatically assumed from the name of the resource), then you can specify it via packages => ['foo', 'bar']

apt::package_from_component() adds the component, runs apt-get update in the correct order and installs the package(s) (along with potential pinning changes, which isn't needed here).

Change 774886 had a related patch set uploaded (by Ottomata; author: Ottomata):

[conda-wmf@master] Initial commit

https://gerrit.wikimedia.org/r/774886

Change 774887 had a related patch set uploaded (by Ottomata; author: Ottomata):

[integration/config@master] Add conda-ci-publish pipeline

https://gerrit.wikimedia.org/r/774887

Change 774887 abandoned by Ottomata:

[integration/config@master] Add conda-ci-publish pipeline

Reason:

Going to investigate using gitlab-ci include snippets instead of common base docker image.

https://gerrit.wikimedia.org/r/774887

Change 774886 abandoned by Ottomata:

[conda-wmf@master] Initial commit

Reason:

https://gerrit.wikimedia.org/r/774886

Mostly done, but to finish we are blocking on waiting for Gitlab Docker images T304845: gitlab: consider enabling docker container registry

I think that we can close this ticket now.
We're now using GitLab-CI to build conda-analytics here: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics
We currently build the conda-analytics .deb file using a GitLab runner in WMCS, but I think we can improve upon this significantly by using the new trusted runners achitecture and the Deployment Pipeline based on kokkuri.

We can also use this to publish the docker images, but we will need to convert our Dockerfile to a blubber.yaml file.

I'll make a separate ticket for this work and close this one.