Numpy 1.24 removes np.bool, which is accessed by Pyspark when collecting a boolean field to a Pandas dataframe.
This is fixed in Pyspark 3.4, according to this StackOverflow answer.
Once Spark is updated to 3.4 or later, we will need to:
- Remove or update the Numpy version pin in Conda-Analytics
- Remove or update the Numpy version specification in Wmfdata