Page MenuHomePhabricator

Improve training and inference pipeline for multilingual link recommendation model
Closed, ResolvedPublic

Description

In T354659, we did exploratory work showing that it is feasible to build a single model that works for many (>50) or even all languages.
However, the current architecture of the model is not suitable to easily maintain such a model; for example, the training pipeline using a bashscript is not suitable to automate the creation/updating of multilingual models.

In this task, we want to improve the pipelines for training and inference.

  • developing an end-to-end training pipeline for the multilingual model; ideally with airflow
  • using the pipeline to train one or few (>=5) models for all languages (i.e. figure out the best way to group languages)
  • revise the inference pipeline for the new model (keeping in mind the requirements to serve it via LiftWing later on) (this will be sketched out in a separate task)

Event Timeline

Update week 8 to 14 April 2024:

  • Went over airflow and research dataset repos
  • Sketched an overview of our current code base workflow and a few research airflow repos.

Update week 15 to 21 April 2024:

  • Discussed how to change our code to Airflow friendly version
  • Identifying changes and decisions to be made wrt add-a-link repo

Update week of 22-28 April, 2024:

  • Refactor code and add CLI arguments.
  • Set up dev airflow instances.

Update week of 29 April - 5 May, 2024:

  • Set up code in research_datasets, create a test file
  • Chat with Fabian on how to connect code to airflow_dags repo
  • Set up airflow dags repo properly and resolve issues setting up a test dag

Update week of 6 - 12 May, 2024:

  • successfully ran test airflow dag, included a main file and fsspec for testing
  • created all required functions in main file.
  • creating dag for the entire pipeline.

Update week of 13 - 19 May, 2024:

  • Rewrite code for airflow
  • Start testing code

Update week of 20 - 26 May, 2024:

  • Debug and fix code to run airflow dag

Update week of 27 May - 02 June, 2024:

  • Finish debugging. Fix dag and research-datasets code. Push updated code to both repos.
  • fix moving embedding to hdfs
  • fix ICU import
  • push MR

Update week of 03 June - 09 June, 2024:

  • fixed pre-commit errors
  • find bug that causes drop in performance

Update week of 10 June - 16 June, 2024:

  • found and fixed bug causing precision loss
  • updated dag to follow guidelines, fixed formatting and ran tests
  • added shards in dag (to be used later)
  • airflow-dags MR sent
  • worked on fixing memory issues. converted a script to use spark instead if purely python.
  • Rebase MR and make changes in dag (to incorporate new changes in research-datasets) [New changes not used yet]

Update week of 17 June - 23 June, 2024:

  • Change read/write methods to fix memory issues. Fixed for small-medium wikis. Additional errors for larger wikis (e.g. enwiki, jawiki)
  • Refactor generate_anchor_dictionary script to modularize better.

Update week of 24 June - 30 June, 2024:

Moving this to the next quarter (FY2024-25-Research-July-September) as the work is not yet fully completed

  • the MR for the pipeline in airflow is submitted.
  • this needs some code-review by research-engineering before it can be merged. this should be completed by mid-July (see T361929#9935541)
  • we expect to resolve the task in the next 1-2 weeks
  • training pipeline with airflow has been merged (MR)
  • will run some tests of the pipeline in the next week(s)
  • inference pipeline will need a separate task as it requires some additional discussion on how to best approach that. see some context in Fabian's recent presentation (slidedeck)
Isaac triaged this task as Medium priority.Jan 13 2025, 11:00 PM
MGerlach added subscribers: MunizaA, fkaelin.

Validated the airflow-dag for the training pipeline.

  • Successfully ran the pipeline to train individual models for 6 different languages
  • Backtesting evaluation is similar to results from previous models
wiki_dbthresholdN_testPrecision (prev)Recall (prev)
arwiki0.510000.996 (0.75)0.394 (0.34)
bnwiki0.510000.666 (0.75)0.257 (0.30)
cswiki0.510000.873 (0.78)0.447 (0.44)
viwiki0.510000.943 (0.89)0.806 (0.65)
simplewiki0.510000.802 (0.79)0.406 (0.45)
enwiki0.510000.846 (0.81)0.410 (0.45)

Updated documentation: https://meta.wikimedia.org/wiki/Research:Improving_multilingual_support_for_link_recommendation_model_for_add-a-link_task#Building_the_Pipeline

Notes:

  • Marking this task as completed. However, this completes the work only for the training pipeline. This can be used to train either individual models for each language or aggregate models for several (or all) languages.
  • Additional work needs to be done on how to integrate this new training pipeline into the deployed addalink service. There are several ways one could do inference once we have trained the models.
  • Aligning the training pipeline with the inference service will be captured in a separate task.

Many thanks to @MunizaA and @fkaelin for their incredible support in moving the pipeline to an airflow-dag.