Page MenuHomePhabricator

Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation
Closed, ResolvedPublic

Description

Hello everybody,

while working on the DSE hackathon Neural Mashup track in T292306 there was a problem with conda stacked envs and the python package magenta. If you create a regular conda stacked env (following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda) and try to pip install magenta, this is the error that you get:

[..]
    src/rtmidi/RtMidi.cpp:1540:10: fatal error: alsa/asoundlib.h: No such file or directory
     #include <alsa/asoundlib.h>
              ^~~~~~~~~~~~~~~~~~

The magenta upstream docs suggest to apt-get install build-essential libasound2-dev libjack-dev, but it doesn't work with our current conda setup. Tried also the following but same error:

conda install -c conda-forge jack alsa-lib

The CPPFLAGS set for me are:

CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /usr/lib/anaconda-wmf/include
DEBUG_CPPFLAGS=-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /usr/lib/anaconda-wmf/include

The -isystem IIUC should force the c++ compiler to look for header files into /usr/lib/anaconda-wmf/include, so neither system headers (installed via apt) nor conda-installed ones are picked up (the latter gets deployed afaics into /home/$(whoami)/.conda/envs/YOUR-STACKED-ENV-NAME/include). The following hack works though:

export CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /usr/lib/anaconda-wmf/include -isystem /home/$(whoami)/.conda/envs/YOUR-STACKED-ENV-NAME/include" (replace YOUR-STACKED-ENV-NAME)

This is surely a corner case since magenta requires python-rtmidi that in turn requires asoundlib.h to compile some c++ files, but I am wondering if we could do something about it in anaconda-wmf. If there is a simpler way apologies for this long task :)

Event Timeline

Yes exactly afaics the CPPFLAGS are set when I activate my stacked conda env. We could try to add:

export CPPFLAGS="${CPPFLAGS} -isystem ${CONDA_PREFIX}/include"

Not sure if too hacky or not, but from my local tests it works fine.

I think the active env path will be available as the CONDA_PREFIX env var.

I think the active env path will be available as the CONDA_PREFIX env var.

Yep way better!

Change 727352 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/debs/anaconda-wmf@debian] Add extra include search path to {CPP,C,CXX,FORTRAN}FLAGS

https://gerrit.wikimedia.org/r/727352

Change 728557 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/debs/anaconda-wmf@debian] Release 2020.02~wmf6

https://gerrit.wikimedia.org/r/728557

Change 727352 merged by Elukey:

[operations/debs/anaconda-wmf@debian] Add extra include search path to {CPP,C,CXX,FORTRAN}FLAGS

https://gerrit.wikimedia.org/r/727352

Change 728557 merged by Elukey:

[operations/debs/anaconda-wmf@debian] Release 2020.02~wmf6

https://gerrit.wikimedia.org/r/728557

Merged the patches, next step is to build the new debian package and install it across our nodes. I can take care of it or leave it to Data Engineering, let me know what you prefer!

We're doing 'offsite' this week so I don't think we'll get to it soon. Please proceed if you need it!

No real need, I think it is fine to wait if anybody wants to get experience with Debian packaging etc..

I was wondering if we could also follow up with upstream about this issue, getting it fixed in there would be nice, but it doesn't seem super easy to find where to apply the fix :D

odimitrijevic moved this task from Incoming to Data Exploration Tools on the Analytics board.

Once this is installed on the servers, will it automatically take effect within users' environments? Or will users have to do something like restart their Jupyter servers or create new Conda environments?

I believe restarting Jupyter servers will be necessary, but that should be all.

Mentioned in SAL (#wikimedia-analytics) [2022-01-19T14:00:45Z] <ottomata> installing anaconda-wmf_2020.02~wmf6_amd64.deb on stat1004 - T292699

Mentioned in SAL (#wikimedia-analytics) [2022-01-19T15:44:51Z] <ottomata> installing anaconda-wmf_2020.02~wmf6_amd64.deb on all analytics cluster nodes. - T292699

Installing on all analytics cluster nodes:

sudo cumin -b 10 -m async  'C:profile::analytics::cluster::packages::common' 'sudo apt-get update && sudo apt-get install -y anaconda-wmf'

This didn't quite work!

After activating a stacked env now, CPPFLAGS are:

-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /usr/lib/anaconda-wmf/include -isystem /usr/lib/anaconda-wmf/include

We had expected that last -isystem to point at the activated conda env, not the base env. I believe this is because the activate.d/env_vars.sh is in anaconda-wmf, not in the user's conda env, so I guess at the time when that script is sourced, $CONDA_PREFIX is /usr/lib/anaconda-wmf.

I guess we need to install an activate.d/env_vars.sh file in the users conda env that does this instead.

Along the way, I noticed that something else is whacky: There is a bug in the deactivate.d/env_vars.sh where the extra env vars anaconda-wmf is setting are not actually being unset. Will add a fix for that too.

Also, from the original patch:

  1. IMPORTANT NOTE:
  2. The following variables area already added by Conda,
  3. we just override them to expand its scope.
  4. Due to the above, these variables don't need to be listed
  5. in the deactivate.d's script.

In my tests, these vars are not being unset. They are unset if I don't use a stacked env, but they are left set if I use conda-deactivate-stacked. Either we need to include them in _CONDA_ACTIVATED_ENV_VARS, or we need to figure out why they aren't being unset when using the stacked env.

Change 758983 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/debs/anaconda-wmf@debian] Actually unset env vars that are activated by conda/activate.d/env_vars.sh

https://gerrit.wikimedia.org/r/758983

Change 762514 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/debs/anaconda-wmf@debian] 2020.02~wmf7 - Improve env vars setting during activate and deactivate

https://gerrit.wikimedia.org/r/762514

Change 758983 merged by Ottomata:

[operations/debs/anaconda-wmf@debian] Actually unset env vars that are activated by conda/activate.d/env_vars.sh

https://gerrit.wikimedia.org/r/758983

Change 762514 merged by Ottomata:

[operations/debs/anaconda-wmf@debian] 2020.02~wmf7 - Improve env vars setting during activate and deactivate

https://gerrit.wikimedia.org/r/762514

Mentioned in SAL (#wikimedia-analytics) [2022-02-15T17:20:23Z] <ottomata> split anaconda-wmf into 2 packages: anaconda-wmf-base and anaconda-wmf. anaconda-wmf-base is installed on workers, anaconda-wmf on clients. The size of the package on workers is now much smaller. Installing throught the cluster. relevant: T292699

Okay, I just deployed a new version of anaconda-wmf. Let's see if this fixes it.