Page MenuHomePhabricator

Package versions in Conda-Analytics are not pinned
Open, MediumPublic

Description

It might seem that versions of crucial packages in Conda-Analytics are pinned in conda-environment.yml (which in turn adds them to conda-environment.lock.yml).

However, those files just specify what versions should be installed in the new environment to start; Conda happily ignores them in all future transactions. This doesn't just mean that package A will be updated if the user runs conda update A. If the user runs conda install B and B lists A as a dependency, Conda will automatically upgrade A to the latest version (even if B's requirement is already satisfied by the existing version).

As you can imagine, this is a huge source of environment problems!

It should be easy to fix this by actually pinning versions when necessary by adding the specifications to a pinned file in the environment's conda-meta directory (docs).

Here's an example pinned file:

jupyter_core ==5.5.0
jupyter_server ==1.24.0
jupyter_telemetry ==0.1.0
jupyterhub ==1.5.0
jupyterhub-ldapauthenticator ==1.3.2
jupyterhub-singleuser ==1.5.0
jupyterhub-systemdspawner ==0.15.0
jupyterlab ==3.4.8
jupyterlab_pygments ==0.2.2
jupyterlab_server ==2.25.0
# https://phabricator.wikimedia.org/T356230
numpy <1.24
# https://phabricator.wikimedia.org/T356230
pandas <2.2
pyspark ==3.1.2
python ==3.10.*
sqlalchemy <2

Details

TitleReferenceAuthorSource BranchDest Branch
Pin essential conda-analytics packagesrepos/data-engineering/conda-analytics!43stevemunenepin_essential_conda_analytics_packagesmain
Draft: Pin essential conda-analytics packagesrepos/data-engineering/conda-analytics!42stevemunenepin_essential_conda_analytics_packagesmain
Customize query in GitLab

Event Timeline

Gehel triaged this task as Medium priority.Feb 9 2024, 1:29 PM
Gehel moved this task from Incoming to 2024.02.12 - 2024.03.03 on the Data-Platform-SRE board.

We have introduces a conda analytics pinned file with pandas and numpy versions for starters and built the dev deb package which we are going to test on an-test-client1002

Mentioned in SAL (#wikimedia-analytics) [2024-04-03T11:46:02Z] <stevemunene> disable puppet on an-test-client1002 to test new conda-analytics version T356231

New package introduces a pinned file for the base environment

stevemunene@an-test-client1002:~$ cat /opt/conda-analytics/conda-meta/pinned 
# https://phabricator.wikimedia.org/T356230
numpy <1.24
# https://phabricator.wikimedia.org/T356230
pandas <2.2

The current pinned file is a base model and we might need a more standardised production pinned file cc @nshahquinn-wmf

@Stevemunene I just created a new cloned Conda environment on an-test-clinet1002 using the Jupyter GUI. However, it doesn't have a pinned file:

nshahquinn-wmf@an-test-client1002:~/.conda/envs/2024-04-03T21.34.11_nshahquinn-wmf/conda-meta$ cat pinned
cat: pinned: No such file or directory

The current pinned file is a base model and we might need a more standardised production pinned file cc @nshahquinn-wmf

The "base model" pinned file does work pretty well: I manually added it to my environment, tried installing/updating a fairly long list of packages an analyst would likely use, and none of the non-pinned crucial packages (e.g. Python, Pyspark, Jupyter packages) got touched since they weren't in the dependency tree.

However, I still think it would be better to have a larger pinned file; in the past (before I introduced my own pinned files), I have broken things quite severely by trying to update Python or Jupyterlab or running conda update -all, and it's better to just make those safe even if most people aren't likely to try them 😁

Thanks @nshahquinn-wmf at the moment the pinned file can only be included in clones if the user wishes to. There is not yet a default way to avail this which does not have the optimal UX.
The pinned file can be availed during cloning by introducing the --pinned tag when cloning shown below;

stevemunene@an-test-client1002:~$ conda-analytics-clone test-pinned --pinned
Creating new cloned conda env test-pinned...
Source:      /opt/conda-analytics
Destination: /home/stevemunene/.conda/envs/test-pinned
.
.
.
Alternatively, you can use the conda-analytic helper script:
  source conda-analytics-activate test-pinned

Checking for the pinned file

stevemunene@an-test-client1002:~$ cat /opt/conda-analytics/conda-meta/pinned 
# https://phabricator.wikimedia.org/T356230
numpy <1.24
# https://phabricator.wikimedia.org/T356230
pandas <2.2

Looking for a way to go around this and avail it by default to all cloned environments for us.

We hit a bit of a delay with this while building the new conda package, so far we have updated the conda-analytics-clone command to include the --pinned tag so that the file is available to everyone using it as per Creating_a_new_environment.
However, we ran into a debian related challenge where the buster-backports repo is no longer available which prevents us from building any images that include his repo on apt-update failure. This is being tracked on T362518 and we plan to solve this by rebuilding our container on bullseye here T362648 so as to unblock progress on this.

We hit a bit of a delay with this while building the new conda package, so far we have updated the conda-analytics-clone command to include the --pinned tag so that the file is available to everyone using it as per Creating_a_new_environment.
However, we ran into a debian related challenge where the buster-backports repo is no longer available which prevents us from building any images that include his repo on apt-update failure. This is being tracked on T362518 and we plan to solve this by rebuilding our container on bullseye here T362648 so as to unblock progress on this.

This has been fixed after deploying v0.0.29 of conda-anaytics and the task is back in progress

Introduced a script to generate the conda pinned file based on the contents of the conda-environment.yml file that we use to generate the lock file. This provides a central place to manage the versions
https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/43