Page MenuHomePhabricator

Fix the link recommendation training pipeline
Closed, ResolvedPublic

Description

The Research team reached out to us that the link recommendation training pipeline setup was failing and they shared the screenshots below:

link recommendation training pipeline setup error - Screenshot from 2023-11-30 12-54-40.png (360×1 px, 63 KB)

link recommendation training pipeline setup error -Screenshot from 2023-11-30 12-51-31.png (71×932 px, 17 KB)

We are going to investigate the cause of this and provide a solution.

Event Timeline

From a quick check, the link recommendation training pipeline uses conda-analytics and the last time this training pipeline was setup, conda-analytics was using Python 3.10.8 and supported: smart-open==2.2.0, wmfdata==2.0.0. At the moment conda-analytics uses Python 3.10.12 and supports smart-open==6.4.0, wmfdata==2.0.1.

I have dug a little more and found that this setup is indeed failing because some package version numbers are way behind specifically these packages:

jupyter-core
packaging
requests
attrs
smart-open
wmfdata

Change 978663 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/mwaddlink@main] requirements: update packages for model training pipeline setup

https://gerrit.wikimedia.org/r/978663

Change 978663 merged by jenkins-bot:

[research/mwaddlink@main] requirements: update packages for model training pipeline setup

https://gerrit.wikimedia.org/r/978663

The instructions for setting up virtual environments have a caveat that encourages users to install pyicu with OS-specific packages. However, we found that when setting up virtual environments for the training pipeline, pyicu==2.9 in requirements.txt installs without any issues.

@MGerlach, and I contemplated removing these instructions since they cause confusion. Before we decided to remove them, we checked whether pyicu is used elsewhere. It turns out that it is used in a blubber.yaml file to create a docker image. Although we don't use this image for the training pipeline, we noticed that it is used to serve the Flask app that creates an API to query link-recommendation models: https://api.wikimedia.org/service/linkrecommendation/apidocs/

We have decided to leave these instructions in place, as they may help people building and using this image.

To fix the link recommendation model training pipeline, I followed the steps below:

  • updated the requirements that were failing
  • setup both the python3.10 conda env and python3.7 env
  • run the training pipeline with a small wiki xhwiki
  • the pipeline run and completed successfully
  • the spark jobs, wikipedia2vec, and sqlitedict run without any issues
  • pushed a patch with the updated packages
calbon set Final Story Points to 1.
calbon moved this task from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.