Page MenuHomePhabricator

Timeout for be-x-old and not support be-tarask in revertrisk language-agnostic
Closed, ResolvedPublic

Description

Hello, where can I find a list of all supported languages in revertrisk-language-agnostic model? Example: be-tarask: "not supported lang", be_tarask: "not supported lang", be-x-old: just a timeout (!) error.

Repro:

curl https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-language-agnostic:predict -X POST -d '{"rev_id": 2409774, "lang": "be-x-old"}'
{"httpReason":"upstream request timeout","httpCode":504}

curl https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-language-agnostic:predict -X POST -d '{"rev_id": 2409774, "lang": "be-tarask"}'
{"error":"Unsupported lang: be-tarask."}

According to https://en.wikipedia.org/wiki/List_of_Wikipedias#Redirects:

  • be-x-old: – redirects to be-tarask

Wikipedia site for be-x-old is https://be-tarask.wikipedia.org, which is the only Wikipedia project whose URL is not an example of https://{lang}.wikipedia.org. This may cause timeouts because LW cannot connect to the correct MediaWiki API.

Also, we should add be-tarask to the list of supported languages in the language-agnostic model.

Event Timeline

I investigated all the Wikipedia projects listed in https://en.wikipedia.org/w/api.php?action=sitematrix&formatversion=2. This source was helpful for checking the metadata for Wiki projects. I initially thought that other languages may have the same issue with redirections, but it turns out that only be-x-old is affected.

Meanwhile, I have added the list of currently supported languages to the model card: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk#Motivation

The list was copied from the model binary: https://phabricator.wikimedia.org/P49491

Change 934544 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: fix language be-x-old's host

https://gerrit.wikimedia.org/r/934544

Change 934544 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: fix language be-x-old's host

https://gerrit.wikimedia.org/r/934544

Timeout issue for be-x-old has been fixed :)

aikochou@wmf3132 ~ % curl https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-language-agnostic:predict -X POST -d '{"rev_id": 2409774, "lang": "be-x-old"}'

{"model_name":"revertrisk-language-agnostic","model_version":"1","wiki_db":"be-x-oldwiki","revision_id":2409774,"output":{"prediction":false,"probabilities":{"true":0.2778533697128296,"false":0.7221466302871704}}}
elukey claimed this task.

Great work Aiko!