Page MenuHomePhabricator

Training pipeline for Revert Risk Language Agnostic (RRLA) model
Closed, ResolvedPublic

Description

Goal

Create an standardized training pipeline for Revert Risk Language Agnostic (RRLA) model

Tasks

  • Generate a training dataset: Adapt the existing code for Revert Risk Multilingual for generating training data for RRLA.
  • Automatize the dataset generation process: Create an Airflow process for creating a new dataset (considering the last 6 months) every month.

Context

The Revert Risk Language Agnostic model is currently a dependency for several projects and teams: Automoderator (T345092) ; Wikimedia Enterprise (T345931) and ORES deprecation.

Our research shows that keeping the models trained with recent data improves their performance significantly.

Also, having a standardized training pipeline would help easily introduce model improvements to deal with some known issues.

Requested but not currently prioritized

  • Train model: Based on this research notebook, create a code for retraining the RRLA model, using the data generated in the previous step.

Instead of prioritizing the above request, we would like to plan and prioritize an abstraction of the request that addresses needs from multiple people in the team on this front. See T351009

Details

Due Date
Apr 30 2024, 11:00 PM

Event Timeline

fkaelin renamed this task from Create an standardized training pipeline for Revert Risk Language Agnostic (RRLA) model to [Requesting Engineering Support] Training pipeline for Revert Risk Language Agnostic (RRLA) model.Oct 26 2023, 2:42 PM

Weekly Updates

  • I have requested engineering support for creating a training pipeline T349755
leila renamed this task from [Requesting Engineering Support] Training pipeline for Revert Risk Language Agnostic (RRLA) model to Training pipeline for Revert Risk Language Agnostic (RRLA) model.Nov 10 2023, 6:14 PM
leila triaged this task as High priority.
leila updated the task description. (Show Details)
leila set Due Date to Dec 21 2023, 12:00 AM.
leila moved this task from Backlog to Staged on the Research board.
leila added a project: Knowledge-Integrity.

Pending on training pipeline readiness, this model will be trained as a first test case for the pipeline.

XiaoXiao-WMF changed Due Date from Dec 21 2023, 12:00 AM to Apr 30 2024, 11:00 PM.

@fkaelin: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

The is resolved, including the training of the model. Code: pipeline / dag