Maniphest T361926

Improve training and inference pipeline for multilingual link recommendation model
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	MGerlach
	Fri, Apr 5, 10:36 AM

Description

In T354659, we did exploratory work showing that it is feasible to build a single model that works for many (>50) or even all languages.
However, the current architecture of the model is not suitable to easily maintain such a model; for example, the training pipeline using a bashscript is not suitable to automate the creation/updating of multilingual models.

In this task, we want to improve the pipelines for training and inference.

developing an end-to-end training pipeline for the multilingual model; ideally with airflow
using the pipeline to train one or few (>=5) models for all languages (i.e. figure out the best way to group languages)
revise the inference pipeline for the new model (keeping in mind the requirements to serve it via LiftWing later on)

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		MGerlach	T342526 Improving multilingual support for link recommendation model for add-a-link task
		Open		AKhatun_WMF	T361926 Improve training and inference pipeline for multilingual link recommendation model

Event Timeline

MGerlach created this task.Fri, Apr 5, 10:36 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFri, Apr 5, 10:36 AM

MGerlach added a project: Research (FY2023-24-Research-April-June).Fri, Apr 5, 10:38 AM

MGerlach updated the task description. (Show Details)

MGerlach mentioned this in T361929: [Research Engineering Request] Building end-to-end training pipeline for the add-a-link model.Fri, Apr 5, 11:04 AM

Update week 8 to 14 April 2024:

Went over airflow and research dataset repos
Sketched an overview of our current code base workflow and a few research airflow repos.

Update week 15 to 21 April 2024:

Discussed how to change our code to Airflow friendly version
Identifying changes and decisions to be made wrt add-a-link repo

Update week of 22-28 April, 2024:

Refactor code and add CLI arguments.
Set up dev airflow instances.

Improve training and inference pipeline for multilingual link recommendation modelOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Improve training and inference pipeline for multilingual link recommendation model
Open, Needs TriagePublic
Actions

Related Objects
Search...