Page MenuHomePhabricator

Conda-Analytics has package conflict when trying to install R with key packages (R-Arrow and R-Stringi)
Open, LowPublic

Description

Steps to reproduce

  1. Clone and activate a new Conda-Analytics environment
  2. Run conda install r-base r-arrow r-stringi (you can omit r-base, but the result is the same since the other packages both require it)
  3. The installation will fail with a LibMambaUnsatisfiableError

I also tried the install command with --solver classic, but it also failed with Found conflicts! Looking for incompatible packages (it was still looking after about 2 hours, at which point I killed it).

Error details

The full error message is:

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package r-arrow-10.0.1-r41hcb278e6_0 requires r-base >=4.1,<4.2.0a0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
[extremely long tree of options omitted]

Pins seem to be involved in the conflict. Currently pinned specs:
 - sqlalchemy[version='<2.0']
 - pyspark=3.1.2
 - pyarrow=9.0.0
 - pandas[version='<2.0.0']
 - numpy[version='<1.24.0']
 - jupyterlab_server=2.25
 - jupyterlab=3.4.8
 - jupyterhub-systemdspawner=0.15.0
 - jupyterhub-ldapauthenticator=1.3.2
 - jupyterhub=1.5.0
 - jupyter_core=5.5
 - python=3.10

My full shell output, including the long option tree, is in P74369, but I recommend trying to reproduce it yourself since the terminal output color-codes the dependency tree according to what conflicts and what doesn't, which makes it a bit easier to understand.'

Workaround

In place of Arrow, Nanoparquet (conda install r-nanoparquet) can be used to load Parquet files. It isn't quite as heavy duty, but it's likely to work for most use cases. This make it possible to continue using Stringi and all the packages that depend on it.

As a result, while this is annoying, it has only moderate impact.

Event Timeline

nshahquinn-wmf renamed this task from Conda-Analytics has package conflict when trying to install key packages (R-Arrow and R-Stringi) to Conda-Analytics has package conflict when trying to install R with key packages (R-Arrow and R-Stringi).Apr 15 2025, 2:36 AM
nshahquinn-wmf updated the task description. (Show Details)

@Ahoelzl what clarification are you looking for?

What I'm hoping is that an SRE can investigate and figure out the root cause of the conflict, which I'd guess is one of the outdated package versions we have pinned (like PyArrow 9, when the latest version is 19). Then we could hopefully prioritize upgrading that package eventually.

I've actually found a better workaround (the Nanoparquet package—details in the description), so this is now a lot lower impact.

Gehel subscribed.

DPE SRE are watching at the moment. Data Engineering should ping us when they need support to deploy a change.

Moving back to incoming so @Ahoelzl can take another look and see if clarification is still needed.

Milimetric moved this task from Incoming (new tickets) to Backlog on the Data-Engineering board.
Milimetric subscribed.

Data Engineering is assuming the work-around will work for now, and support of R is kind of undefined at the moment