User Details
- User Since
- Jan 6 2025, 12:21 PM (56 w, 6 d)
- Availability
- Available
- IRC Nick
- georgekyz
- LDAP User
- Gkyziridis
- MediaWiki User
- GKyziridis-WMF [ Global Accounts ]
Fri, Feb 6
Update
Since the task: T406217 is finished we have a first version of end-to-end pipeline including all the basic steps of an ML-Lifecycle: Data Generation -> Model Training -> Export model in S3 bucket.
More info could be found here: https://phabricator.wikimedia.org/T398970
Generate Data (SparkSubmitOperator) -> Train/Validation/Test split (SparkSubmitOperator) -> Copy from HDFS to a PVC (WMFKubernetesPodOperator) -> Train model on GPU pod (WMFKubernetesPodOperator) -> Copy retrained model to S3 (PythonOperator)
Hey, I am working on this, I think that I have finished the implementation for publishing the predictions in events. I am now testing it locally.
Based on this: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams I think there are these steps:
- Implementation on inference-services side (this is what I am testing).
- Test it and deploy the new model server versions.
- Configure Changeprop.
- Configure the new changes in the mediawiki-config repo.
Tue, Feb 3
Fri, Jan 30
Thu, Jan 29
Hey @Isaac, this ticket is assigned to @klausman but he is currently on his sabbatical. He will start working on this when he is back, I think around next month (???).
I am tagging @DPogorzelski-WMF here for visibility, maybe he has something more to add.
Update
The end-to-end tone-check retraining pipeline succeeded, we solved the issues of Multy-Attach PVC.
The new version of the retrained tone-check model is successfully copied to the dedicated S3 bucket under: s3://wmf-ml-models/retrained-models/tone-check/, here are the logs of the export step:
Here are the content of the S3 bucket:
$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls -H s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/ 2026-01-28 22:24 865 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/config.json 2026-01-28 22:24 678M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/model.safetensors 2026-01-28 22:24 1357M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/optimizer.pt 2026-01-28 22:24 13K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/rng_state.pth 2026-01-28 22:24 1064 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/scheduler.pt 2026-01-28 22:24 695 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/special_tokens_map.json 2026-01-28 22:24 2M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/tokenizer.json 2026-01-28 22:24 1330 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/tokenizer_config.json 2026-01-28 22:24 9K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/trainer_state.json 2026-01-28 22:24 5K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/training_args.bin 2026-01-28 22:24 972K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-21530/vocab.txt
Wed, Jan 28
We are currently do not store anywhere the predictions from the rr-multilingual model so we cannot export them in the same way that we are doing for the rr-language-agnostic one.
If there is this necessity, I can open a new Phabricator task in order to start developing the first step of saving the slice of the rr-multilingual predictions into the event stream, and then we can add them to the refinery and export them into the event_sanitized as we do for the rr-langugage-agnostic.
Tue, Jan 27
I also checked the PVC using kubectl and I see that the PVC is "RWO": "ReadWriteOnce" I am not sure if this makes the problem:
$ kube_env airflow-ml-deploy dse-k8s-eqiad $ kubectl get pvc airflow-ml-model-training -n airflow-dev NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE airflow-ml-model-training Bound pvc-8a6a2920-8d7e-4616-8ab6-a6a70b26d116 20Gi RWO ceph-rbd-ssd 151d
Wed, Jan 21
$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls -H --recursive s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/ 2026-01-20 13:33 865 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/config.json 2026-01-20 13:33 678M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/model.safetensors 2026-01-20 13:33 1357M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/optimizer.pt 2026-01-20 13:33 13K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/rng_state.pth 2026-01-20 13:33 1064 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/scheduler.pt 2026-01-20 13:33 695 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/special_tokens_map.json 2026-01-20 13:33 2M s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/tokenizer.json 2026-01-20 13:33 1330 s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/tokenizer_config.json 2026-01-20 13:33 24K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/trainer_state.json 2026-01-20 13:33 5K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/training_args.bin 2026-01-20 13:33 972K s3://wmf-ml-models/retrained-models/tone-check/checkpoint-63618/vocab.txt
Tue, Jan 20
Thu, Jan 15
Mon, Jan 12
Jan 9 2026
curl -s -X \ POST "https://inference.svc.eqiad.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" \ -d '{"rev_id": 2, "lang": "test"}' \ -H "Host: revertrisk-language-agnostic.revertrisk.wikimedia.org"
Jan 6 2026
Things we need to keep in mind:
- Testwiki is not a canonical/normal wiki so it is excluded from canonical_wikis list
- Testwiki is not a supported wiki for the revertrisk model, so predictions will be completely inaccurate.
- We treat testwiki as enwiki on the fly in order for the revert-risk model server to accept such API hits posting {"lang"="test"}
Dec 18 2025
Dec 17 2025
The testwiki is not a canonical Wikipedia, it is a testing environment where articles can be written in any language, and it wasn’t part of the RR model’s training data, so we excluded from the list of canonical Wikipedias. So the, RR model doesn’t support testwiki.
We can easily parse this in the following two requests to enwiki and testwiki, respectively:
Dec 15 2025
Dec 11 2025
Dec 10 2025
Dec 9 2025
I built the image using: docker build --network=host -t torch_rocm3 .
Dec 3 2025
Nov 27 2025
When we start the actual deployment:
Due to the fact that we have a huge number of wikis which are needed to be deployed, I suggest to to do it in batches. Right now, in the patch above only the thresholds are set for each wiki, that means that if this patch is merged and deployed nothing will be changed. In the next iterations, when we start to deploying the wikis we need to enable ORES model and enable the UI as well. Only then the thresholds which are configured in the patch will be functional. So, I suggest to enable ORES model in batches e.g. for 4-5 wikis per batch. This will take some time to finish all batches, but we can easily handle issues that could occur during the backport deployments
Update
I configure all the rr thresholds for all the wikis and enabled the model for all of them in this patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1212086 .
I excluded thwiki from the above patch since you are using it for the MVP.
I also avoided to run the composer manage-dblist add {wiki_name} ores for all the wikis, which means that whenever we deploy all these wikis we need to run the composer for all of them.
Nov 26 2025
I think that there is one more step which needs to be done which is to run: composer manage-dblist add {wiki_name} ores. I do not see thwiki to be added under "dblists/ores.dblist" file in this patch -> https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1207932
Nov 25 2025
Nov 24 2025
Progress update on the hypothesis for the week, including if something has shipped:
Hey @kevinbazira thank very much for running the loading tests for Revert-Risk wikidata.
I think we should change a little bit the configuration in order to simulate a more realistic scenario close to reality.
We also need to run heavier tests spawning more users in order to check our API's capacity and capability to handle maximum RPS.
I ran three different locust tests with heavier configuration, you can see the results in the following phab paste:
Nov 20 2025
For the dewiki we had spotted an issue which is described here: T407155#11311194 regarding many english samples used for training the model in dewiki. In order to overcome this, I used translation only where the english samples exists inside the dewiki dataset.
Nov 17 2025
You can try tweaking the filters in the notebook, such as loosening the diff size conditions, expanding the revert time periods, or asking the community for more signals if possible.
Nov 14 2025
Nov 12 2025
Update
The issue I am facing for reproducing the error is that we are logging the incoming request if it is successful (status code 200), but we do not log it if is not.
We need somehow to log it immediately after we receive it in order to reproduce it.
I will open a ticket for upgrading the logging on the model server side: https://phabricator.wikimedia.org/T409931
Nov 10 2025
# Request $ curl -i -X \ POST localhost:8080/v1/models/revertrisk-multilingual:predict \ -d '{"lang": "ru", "rev_id": 149673768}'
