Page MenuHomePhabricator

Remove unused thirdparty/conda repository
Closed, ResolvedPublic

Description

While working through T363000 , I noticed that the conda repository is not available in Bookworm . Creating this ticket to:

  • publish conda repo for Bookworm
  • verify operation

Update

We have identified that the repository is unusued, so we can remove it instead.

Event Timeline

Gehel triaged this task as Medium priority.May 10 2024, 8:19 AM
Gehel moved this task from Incoming to 2024.05.06 - 2024.05.26 on the Data-Platform-SRE board.

Thanks @bking, you're right, we don't publish conda-analytics for bookworm, yet.

However, I'm not sure that we need to at the moment either. We don't have any stats servers or hadoop workers or airflow instances running bookworm.

btullis@cumin1002:~$ sudo cumin --force --no-progress P:analytics::conda_analytics 'facter os.distro.codename'
127 hosts will be targeted:
an-airflow[1002,1004-1007].eqiad.wmnet,an-coord[1003-1004].eqiad.wmnet,an-launcher1002.eqiad.wmnet,an-test-client1002.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-worker[1001-1003].eqiad.wmnet,an-worker[1078-1175].eqiad.wmnet,analytics[1070-1077].eqiad.wmnet,stat[1004-1011].eqiad.wmnet
FORCE mode enabled, continuing without confirmation
===== NODE GROUP =====
(6) an-launcher1002.eqiad.wmnet,stat[1004-1008].eqiad.wmnet
----- OUTPUT of 'facter os.distro.codename' -----
buster
===== NODE GROUP =====
(121) an-airflow[1002,1004-1007].eqiad.wmnet,an-coord[1003-1004].eqiad.wmnet,an-test-client1002.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-worker[1001-1003].eqiad.wmnet,an-worker[1078-1175].eqiad.wmnet,analytics[1070-1077].eqiad.wmnet,stat[1009-1011].eqiad.wmnet
----- OUTPUT of 'facter os.distro.codename' -----
bullseye
================
100.0% (127/127) success ratio (>= 100.0% threshold) for command: 'facter os.distro.codename'.
100.0% (127/127) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

When we do start working to upgrade the Hadoop cluster and stats clients to bookworm, conda-analytics will just be one of the components that we need to consider.
We definitely should try to avoid running conda inside containers, since that would add bloat.

So, if it were me, I would remove this ticket from the current milestone. I might even be tempted to decline it for now, because we haven't exactly decided that we 100% want to carry on with conda-analytics in its current form.
It depends a little on how much progress we make with Kubernetes based projects (JupyterLab, Kubeflow, Spark etc.) in the next couple of years.

Oh sorry @bking, I have misunderstood. You did mean a thirdparty/conda apt repository.
Not the conda-analytics package that I mentioned.

I'm tracing this back now. It was added in here: T304450: Create conda .deb and docker image but I have a feeling that we're not actually using it anywhere, so we might want to remove it from the apt repo.

For example, the conda-analytics build pipeline downloads the official installer and runs it: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/blob/main/docker/Dockerfile?ref_type=heads#L40
So does the current airflow build pipeline: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/debian/Dockerfile?ref_type=heads#L77

It looks like @Ottomata added it whilst working out how best to build conda-analytics, but decided against using it in the end, so I think that we should probably remove it, if we can.

I don't really remember the history here. I think @Antoine_Quhen did some work on this. Maybe he knows?

Change #1047085 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove conda repository from reprepro configuration

https://gerrit.wikimedia.org/r/1047085

BTullis renamed this task from Publish conda repository for bookworm to Remove unused thirdparty/conda repository.Jun 18 2024, 1:45 PM
BTullis updated the task description. (Show Details)
BTullis updated the task description. (Show Details)

Once Remove conda repository from reprepro configuration has been merged, we will still need to clean up reprepro by hand, as per: https://wikitech.wikimedia.org/wiki/Reprepro#Removing_a_component

Change #1047085 merged by Btullis:

[operations/puppet@production] Remove conda repository from reprepro configuration

https://gerrit.wikimedia.org/r/1047085

Joe raised the priority of this task from Medium to Unbreak Now!.Jun 19 2024, 6:04 AM
Joe subscribed.

Whatever was changed here has left reprepro broken:

Error: packages database contains unused 'bullseye-wikimedia|thirdparty/conda|amd64' database.
This usually means you removed some component, architecture or even
a whole distribution from conf/distributions.
In that case you most likely want to call reprepro clearvanished to get rid
of the databases belonging to those removed parts.
(Another reason to get this error is using conf/ and db/ directories
 belonging to different reprepro repositories).
To ignore use --ignore=undefinedtarget.
There have been errors!

Please when you make changes to configurations always verify that it's working as intended.

Mentioned in SAL (#wikimedia-operations) [2024-06-19T06:05:55Z] <_joe_> deleting manually thirdparty/conda repositories from reprepro T364550

Joe claimed this task.

I fixed the situation myself as we had urgent need of uploading packages.

Change #1047933 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Revert "Remove conda repository from reprepro configuration"

https://gerrit.wikimedia.org/r/1047933

Change #1047933 merged by Btullis:

[operations/puppet@production] Revert "Remove conda repository from reprepro configuration"

https://gerrit.wikimedia.org/r/1047933

It turns out that I was wrong. We do still use this.

Now that @brouberol has fixed T366878: Not all data engineering gitlab projects are indexed in Codesearch, we can see from codesearch that it is used in workflow-utils and also dumps.

I reverted my previous change, merged the patch and I have executed the sync on apt1002 to pull down the changes.

btullis@apt1002:~$ sudo -i reprepro --verbose --component  thirdparty/conda update bullseye-wikimedia
aptmethod got 'http://haproxy.debian.net/dists/bullseye-backports-2.7/InRelease'
aptmethod got 'http://haproxy.debian.net/dists/bullseye-backports-2.8/InRelease'
aptmethod redirects 'http://pkg.jenkins-ci.org/debian-stable/binary/Release' to 'https://pkg.jenkins.io/debian-stable/binary/Release'
aptmethod got 'http://packages.confluent.io/deb/7.4/dists/stable/InRelease'
aptmethod got 'http://repo.radeon.com/rocm/apt/5.4/dists/focal/InRelease'
aptmethod got 'http://haproxy.debian.net/dists/bullseye-backports-2.6/InRelease'
aptmethod got 'http://hwraid.le-vert.net/debian/dists/stretch/Release'
aptmethod got 'https://deb.nodesource.com/node_14.x/dists/bullseye/InRelease'
aptmethod got 'https://deb.nodesource.com/node_16.x/dists/bullseye/InRelease'
aptmethod got 'https://packages.nlnetlabs.nl/linux/debian/dists/bullseye/Release'
aptmethod got 'https://download.ceph.com/debian-reef/dists/bookworm/InRelease'
aptmethod got 'https://download.ceph.com/debian-quincy/dists/bullseye/InRelease'
aptmethod got 'https://storage.googleapis.com/gvisor/releases/dists/release/InRelease'
aptmethod got 'https://download.docker.com/linux/debian/dists/bullseye/InRelease'
aptmethod got 'https://repo.anaconda.com/pkgs/misc/debrepo/conda/dists/stable/InRelease'
aptmethod got 'http://downloads.linux.hpe.com/SDR/repo/mcp/dists/bullseye/current/Release'
aptmethod redirects 'https://packages.gitlab.com/gitlab/gitlab-ce/debian/dists/bullseye/InRelease' to 'https://d20rj4el6vkp4c.cloudfront.net/7/8/debian/dists/bullseye/InRelease?t=1718878613_c4ba29565a7d0226621d138edaeed852b747dc48'
aptmethod redirects 'https://packages.gitlab.com/runner/gitlab-runner/debian/dists/buster/InRelease' to 'https://d20rj4el6vkp4c.cloudfront.net/8/56/debian/dists/buster/InRelease?t=1718878613_6b25ecba5ccf6213018e09e77b545c60f05d52ad'
aptmethod got 'https://mirrors.xtom.com/mariadb/repo/10.5/debian/dists/bullseye/InRelease'
aptmethod redirects 'http://pkg.jenkins-ci.org/debian-stable/binary/Release.gpg' to 'https://pkg.jenkins.io/debian-stable/binary/Release.gpg'
aptmethod got 'https://mirror.croit.io/debian-octopus/dists/bullseye/InRelease'
aptmethod got 'http://hwraid.le-vert.net/debian/dists/stretch/Release.gpg'
aptmethod got 'https://artifacts.elastic.co/packages/oss-7.x/apt/dists/stable/Release'
aptmethod got 'https://packages.elastic.co/curator/5/debian9/dists/stable/Release'
aptmethod got 'http://downloads.linux.hpe.com/SDR/repo/mcp/dists/bullseye/current/Release.gpg'
aptmethod got 'https://pkg.jenkins.io/debian-stable/binary/Release'
aptmethod got 'https://d20rj4el6vkp4c.cloudfront.net/8/56/debian/dists/buster/InRelease?t=1718878613_6b25ecba5ccf6213018e09e77b545c60f05d52ad'
aptmethod got 'https://d20rj4el6vkp4c.cloudfront.net/7/8/debian/dists/bullseye/InRelease?t=1718878613_c4ba29565a7d0226621d138edaeed852b747dc48'
aptmethod got 'https://pkg.jenkins.io/debian-stable/binary/Release.gpg'
aptmethod got 'https://packages.nlnetlabs.nl/linux/debian/dists/bullseye/Release.gpg'
aptmethod got 'https://artifacts.elastic.co/packages/oss-7.x/apt/dists/stable/Release.gpg'
aptmethod got 'https://packages.elastic.co/curator/5/debian9/dists/stable/Release.gpg'
aptmethod got 'https://repo.anaconda.com/pkgs/misc/debrepo/conda/dists/stable/main/binary-amd64/Packages.bz2'
Calculating packages to get...
Getting packages...
aptmethod got 'https://repo.anaconda.com/pkgs/misc/debrepo/conda/pool/main/c/conda/conda_24.5.0-0_amd64.deb'
Shutting down aptmethods...
Installing (and possibly deleting) packages...
Exporting indices...

I'll reopen the ticket and port the repository to bookworm, as @bking had originally intended. Sorry for all of the inconvenience.