Page MenuHomePhabricator

Improve training and inference pipeline for multilingual link recommendation model
Open, Needs TriagePublic

Description

In T354659, we did exploratory work showing that it is feasible to build a single model that works for many (>50) or even all languages.
However, the current architecture of the model is not suitable to easily maintain such a model; for example, the training pipeline using a bashscript is not suitable to automate the creation/updating of multilingual models.

In this task, we want to improve the pipelines for training and inference.

  • developing an end-to-end training pipeline for the multilingual model; ideally with airflow
  • using the pipeline to train one or few (>=5) models for all languages (i.e. figure out the best way to group languages)
  • revise the inference pipeline for the new model (keeping in mind the requirements to serve it via LiftWing later on)

Event Timeline

Update week 8 to 14 April 2024:

  • Went over airflow and research dataset repos
  • Sketched an overview of our current code base workflow and a few research airflow repos.

Update week 15 to 21 April 2024:

  • Discussed how to change our code to Airflow friendly version
  • Identifying changes and decisions to be made wrt add-a-link repo

Update week of 22-28 April, 2024:

  • Refactor code and add CLI arguments.
  • Set up dev airflow instances.