Page MenuHomePhabricator

Upgrade to Numpy ≥ 1.24 in Conda-Analytics
Open, MediumPublic

Description

Numpy 1.24 removes np.bool, which is accessed by Pyspark when collecting a boolean field to a Pandas dataframe.

This is fixed in Pyspark 3.4, according to this StackOverflow answer.

Once Spark is updated to 3.4 or later, we will need to:

  • Remove or update the Numpy version pin in Conda-Analytics
  • Remove or update the Numpy version specification in Wmfdata