Page MenuHomePhabricator

New conda-analytics clone installs packages from Anaconda repository
Closed, ResolvedPublic

Description

I just created a new conda environment on stat1011. The packages that were cloned all come from the channel conda-forge or pypi, but if I install a new package, it is installed from Anaconda's defaults channel.

Here's an example of a package being installed from defaults:

nshahquinn-wmf@stat1011:~$ conda install plotly
Retrieving notices: ...working... done
Channels:
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
    current version: 23.10.0
    latest version: 24.11.3

Please update conda by running

    $ conda update -n base -c conda-forge conda



## Package Plan ##

  environment location: /home/nshahquinn-wmf/.conda/envs/2025-01-08T23.37.26_nshahquinn-wmf

  added / updated specs:
    - plotly


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2024.12.31 |       h06a4308_0         128 KB  defaults
    certifi-2024.12.14         |  py310h06a4308_0         160 KB  defaults
    conda-24.11.3              |  py310h06a4308_0         930 KB  defaults
    distro-1.9.0               |  py310h06a4308_0          31 KB  defaults
    frozendict-2.4.2           |  py310h5eee18b_0          55 KB  defaults
    menuinst-2.2.0             |  py310h06a4308_0         226 KB  defaults
    plotly-5.24.1              |  py310h2f386ee_0         9.9 MB  defaults
    tenacity-9.0.0             |  py310h06a4308_0          54 KB  defaults
    ------------------------------------------------------------
                                           Total:        11.5 MB

The following NEW packages will be INSTALLED:

  distro             pkgs/main/linux-64::distro-1.9.0-py310h06a4308_0 
  frozendict         pkgs/main/linux-64::frozendict-2.4.2-py310h5eee18b_0 
  menuinst           pkgs/main/linux-64::menuinst-2.2.0-py310h06a4308_0 
  plotly             pkgs/main/linux-64::plotly-5.24.1-py310h2f386ee_0 
  tenacity           pkgs/main/linux-64::tenacity-9.0.0-py310h06a4308_0 

The following packages will be UPDATED:

  ca-certificates    conda-forge::ca-certificates-2024.8.3~ --> pkgs/main::ca-certificates-2024.12.31-h06a4308_0 
  certifi            conda-forge/noarch::certifi-2024.8.30~ --> pkgs/main/linux-64::certifi-2024.12.14-py310h06a4308_0 
  conda              conda-forge::conda-23.10.0-py310hff52~ --> pkgs/main::conda-24.11.3-py310h06a4308_0 



Downloading and Extracting Packages:
                                                                                                                        
Preparing transaction: done                                                                                             
Verifying transaction: done                                                                                             
Executing transaction: done

This is probably related to the following warning that displays when I run certain Conda commands e.g. conda env list:

/home/nshahquinn-wmf/.conda/envs/2025-01-08T23.37.26_nshahquinn-wmf/lib/python3.10/site-packages/conda/base/context.py:201: FutureWarning: Adding 'defaults' to channel list implicitly is deprecated and will be removed in 25.3. 

To remove this warning, please choose a default channel explicitly with conda's regular configuration system, e.g. by adding 'defaults' to the list of channels:

  conda config --add channels defaults

For more information see https://docs.conda.io/projects/conda/en/stable/user-guide/configuration/use-condarc.html

  deprecated.topic(

I don't have any custom configuration installed that modifies the channels:

nshahquinn-wmf@stat1011:~$ conda config --show-sources                                                                  
==> /home/nshahquinn-wmf/.conda/envs/2025-01-08T23.37.26_nshahquinn-wmf/condarc <==
envs_dirs:
  - ${HOME}/.conda/envs
pkgs_dirs:
  - ${HOME}/.conda/pkgs
  - /home/nshahquinn-wmf/.conda/envs/2025-01-08T23.37.26_nshahquinn-wmf/pkgs
show_channel_urls: True
always_yes: True
solver: libmamba

==> /home/nshahquinn-wmf/.conda/condarc <==
pkgs_dirs:
  - /home/nshahquinn-wmf/.conda/pkgs

For an expanded terminal log with more detail about my configuration and the issues, see P71888.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Bring the versions up to date with the changelogrepos/data-engineering/conda-analytics!58btullisfix_bumpversionmain
Restrict further the use of anaconda default channelsrepos/data-engineering/conda-analytics!57btullisdeny_defaultsmain
Customize query in GitLab

Event Timeline

nshahquinn-wmf changed the visibility from "Custom Policy" to "Public (No Login Required)".
nshahquinn-wmf changed the edit policy from "Custom Policy" to "All Users".
nshahquinn-wmf edited projects, added Data-Platform-SRE; removed Data-Engineering.

Conda-analytics was migrated to Miniforge in September, but either in the process or independently, there was a regression from the previous behavior of preferring the Conda-Forge channel (T302819#8309900).

It seems as though the default channel is conda-forge when using miniforge: https://stackoverflow.com/a/77246895

If that is indeed the case, I think we're ok.

@brouberol unfortunately, it seems like the "defaults" here is actually the Anaconda channel (although I could be missing something, since is indeed very weird that Miniforge is ending up pulling from Anaconda's channels).

Here are excerpts from conda config --show (from P71888):

channel_alias: https://conda.anaconda.org
[...]
channel_settings: []
channels:
  - defaults
[...]
custom_channels:
  pkgs/main: https://repo.anaconda.com
  pkgs/r: https://repo.anaconda.com
  pkgs/pro: https://repo.anaconda.com
custom_multichannels:
  defaults: 
    - https://repo.anaconda.com/pkgs/main
    - https://repo.anaconda.com/pkgs/r
  local: 
[...]
default_channels:
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r

Also, if I look in ~/.conda/pkgs/urls.txt (which I know was created yesterday because I had to delete it in order to create the new environment because of T380477), I see about 50% Conda-Forge URLs and 50% Anaconda ones.

FYI, I was able to fix this for myself by adding a condarc file with:

channels:
    - conda-forge
    - nodefaults

Maybe the fix is just adding that to the Conda-Analytics condarc. It sounds like that may cause a problem with Miniforge, but there must be some way to fix that.

Also, it's probably a good idea to explicitly block the Anaconda channels using denylist_channels, to prevent folks from using them when following commands recommended online:

denylist_channels:
  - defaults
  - anaconda
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r
  - https://repo.anaconda.com/pkgs/msys2

Thanks @nshahquinn-wmf - I am building a new version of conda-analytics to try to fix this issue, based on your suggestions.
You can track that here: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/57

I found this blog post from Anaconda, which mentions significant changes in version 24.9.0 of conda.

image.png (513×1 px, 132 KB)

This led me to read the release notes for conda version 24.9.0.

We can see that:

  • the denylist_channels option was introduced in this version
  • the default channels are no longer hardcoded into conda from this version onwards

This statement is highly relevant:

Following feedback from conda users about the pre-configuration of the conda code base to favor channels from Anaconda Inc, we've started the process to deprecate hardcoding Anaconda's channels as the default set of channels in the conda source code, which is a remnant of conda's incubation at the company.

The change that I am currently working on upgrades conda (and the miniforge installer) from version 23.10.0 to 24.11.2 so I think that it will pick up the changed behaviour around the default channel, without my needing to include the nodefaults option. However, I will add the denylist_channels though, as you suggested.

I have tested version 24.11.2 of miniforge locally, and I can confirm that it no longer lists the default channel in conda config --show as you found here.

(base) btullis@marlin:~/wmf/conda-analytics$ conda info | grep 'conda version'
          conda version : 24.11.2

(base) btullis@marlin:~/wmf/conda-analytics$ conda config --show | grep -A2 ^channels:
channels:
  - conda-forge
client_ssl_cert: None

I'll look to see if I can test this new version on an-test-client1002 today, then if it works I'll try to push it out to the production cluster next week.

I have done some testing by building a test package and installing it on an-test-client1002.

btullis@an-test-client1002:~$ wget https://gitlab.wikimedia.org/api/v4/projects/359/packages/generic/conda-analytics/0.0.37/conda-analytics-0.0.37_amd64.deb
--2025-01-17 15:15:50--  https://gitlab.wikimedia.org/api/v4/projects/359/packages/generic/conda-analytics/0.0.37/conda-analytics-0.0.37_amd64.deb
Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8
Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1044249260 (996M) [application/octet-stream]
Saving to: ‘conda-analytics-0.0.37_amd64.deb’

conda-analytics-0.0.37_amd64.deb                     100%[=====================================================================================================================>] 995.87M   110MB/s    in 9.7s    

2025-01-17 15:16:00 (103 MB/s) - ‘conda-analytics-0.0.37_amd64.deb’ saved [1044249260/1044249260]

btullis@an-test-client1002:~$ sudo dpkg -i conda-analytics-0.0.37_amd64.deb 
(Reading database ... 267906 files and directories currently installed.)
Preparing to unpack conda-analytics-0.0.37_amd64.deb ...
Unpacking conda-analytics (0.0.37) over (0.0.36) ...
Setting up conda-analytics (0.0.37) ...
Post install script.
  Running /opt/conda-analytics/bin/python /opt/conda-analytics/bin/conda-unpack...

I checked that it contained the updated condarc file.

btullis@an-test-client1002:~$ head -n 14 /opt/conda-analytics/condarc 
# We no longer include the default channel. This was still hard-coded
# to include anaconda.com URLs until version 24.9.0 of miniforge. See #T383284
channels:
 - conda-forge

# We wish expressly to deny that any of these channels be used, even by using a
# command-line override option.
denylist_channels:
  - defaults
  - anaconda
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r
  - https://repo.anaconda.com/pkgs/msys2

I then enabled the proxy and created a test environment:

btullis@an-test-client1002:~$ set_proxy
Proxy set
btullis@an-test-client1002:~$ conda-analytics-clone T383284
btullis@an-test-client1002:~$ source conda-analytics-activate T383284

I activated the environment and installed plotly:

btullis@an-test-client1002:~$ source conda-analytics-activate T383284
(T383284) btullis@an-test-client1002:~$ conda install plotly
<snip>
The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    plotly-5.24.1              |     pyhd8ed1ab_1         7.7 MB  conda-forge
    tenacity-9.0.0             |     pyhd8ed1ab_1          24 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         7.7 MB

The following NEW packages will be INSTALLED:

  plotly             conda-forge/noarch::plotly-5.24.1-pyhd8ed1ab_1 
  tenacity           conda-forge/noarch::tenacity-9.0.0-pyhd8ed1ab_1

I verified that the .conda/pkg/urls.txt file only contained conda-forge URLs.

In addition, I also checked that jupyterhub worked.

btullis@an-test-client1002:~$ sudo systemctl restart jupyterhub-conda.service

I then logged in and created a new environment, which tests conda-analytics-clone without the proxy being set.

image.png (235×953 px, 31 KB)

So I'm happy that this is likely functional.

Thanks for the update, @BTullis!

In case you didn't see my comment on the other task (T380477#10472865): when you are able to make the release, it is possible for you to include include the approved-but-unmerged PR updating the Wmfdata repo? It's not urgent, but it has been waiting for 3 months now.

Thanks for the update, @BTullis!

In case you didn't see my comment on the other task (T380477#10472865): when you are able to make the release, it is possible for you to include include the approved-but-unmerged PR updating the Wmfdata repo? It's not urgent, but it has been waiting for 3 months now.

Thanks @nshahquinn-wmf - I have included that patch in version 0.0.38 of conda-analytics and deployed this across the cluster.

I'll tentatively mark this issue as resolved, but please do feel free to reopen it if you find any more packages being installed from the Anaconda repos.

We saw an issue that caused the jupyterhub-conda service to go into a restart loop on stat1010. Details in this Slack thread.

Essentially, users had a ~/.condarc file that referenced either the default channel, or repo.anaconda.com specifically. This caused the conda info command to return a 1, which then caused the failure to start the user's hub server.

Renaming the .condarc files fixed the issue.