Page MenuHomePhabricator

Fix translatewiki-reverted and frwikisource-articlequality isvcs
Closed, ResolvedPublic

Description

After some checks for the API-GW made by Tobias, two problem arose:

  • translatewiki-reverted is set with translate.wikipedia.org as Host header, but it doesn't exist (and hence the queries fail).
  • frwikisrouce-articlequality return this error msg in the kserve-container's logs:
[E 221205 16:22:49 extractor_utils:165] An error has occurred while fetching feature values from the MW API: Cannot connect to host incubator.wikimedia.org:443 ssl:default [Connection reset by peer]

We don't really call incubator.wikimedia.org when creating the MWApiCache, maybe this is a weird/old thing that needs to be fixed in revscoring?

Event Timeline

The problem is their hosts are not set correctly.

translatewiki-reverted

According to a commit that added translatewiki-reverted model to the editquality repo https://github.com/wikimedia/editquality/commit/50655df996a8fdc153389693e0171042cfaca6af,
the host it uses is https://translatewiki.net.

But another interesting thing is that I don't see this model deployed on ORES (see https://ores.wikimedia.org/v3/scores/ for available models)

frwikisource-articlequality

The host for frwikisource is fr.wikisource.org. It is not set in the value.yaml in deployment-charts repo. Why connected to incubator.wikimedia.org? I guess mwapi has some kinds of redirect mechanism that will redirect unknown Wikipedia to the Incubator. (see FAQ for incubator)

Moreover, according to https://ores.wikimedia.org/v3/scores/frwikisource, frwikisource doesn't have articlequality model, only a pagelevel model. We should consider to add it to inference-service repo and value.yaml (and the same situation happened for wikidatawiki which has two special model name itemquality and itemtopic)

Just a suggestion for future deployments -- we should always test the model works after deployment, e.g. make a simple request to the model server after deployment and make sure the prediction result matches what we get in ORES. At least we should do it until we build some automated testing workflows. :)

Great summary, thanks for working on this!

The problem is their hosts are not set correctly.

translatewiki-reverted

According to a commit that added translatewiki-reverted model to the editquality repo https://github.com/wikimedia/editquality/commit/50655df996a8fdc153389693e0171042cfaca6af,
the host it uses is https://translatewiki.net.

But another interesting thing is that I don't see this model deployed on ORES (see https://ores.wikimedia.org/v3/scores/ for available models)

I see https://github.com/wikimedia/mediawiki-services-ores-deploy/commit/ad160b0405bd35d10aa525f3ff78fd0c23e2b10b that confirms what you wrote above, let's remove translatewiki from Lift Wing :) If we need it in the future we can work on it, but for the moment it seems out of scope.

frwikisource-articlequality

The host for frwikisource is fr.wikisource.org. It is not set in the value.yaml in deployment-charts repo. Why connected to incubator.wikimedia.org? I guess mwapi has some kinds of redirect mechanism that will redirect unknown Wikipedia to the Incubator. (see FAQ for incubator)

Moreover, according to https://ores.wikimedia.org/v3/scores/frwikisource, frwikisource doesn't have articlequality model, only a pagelevel model. We should consider to add it to inference-service repo and value.yaml (and the same situation happened for wikidatawiki which has two special model name itemquality and itemtopic)

During the past month I don't see any trace of frwikisource in Ores' traffic (see https://w.wiki/66vV, it is sampled traffic but still..). I would personally drop the model from Lift Wing, re-adding it if the community wants/needs it.

Just a suggestion for future deployments -- we should always test the model works after deployment, e.g. make a simple request to the model server after deployment and make sure the prediction result matches what we get in ORES. At least we should do it until we build some automated testing workflows. :)

+1

Change 867601 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: remove translatewiki and frwikisource isvcs

https://gerrit.wikimedia.org/r/867601

@calbon We have discussed this during the team meeting, and we'd like to remove the above models from Lift Wing. One is not supported by ORES as well, and another one doesn't receive any traffic (from ORES git commit checks it seems that it was more a test that something stable). Let us know if we have the green light to proceed (we can always add a model back if we realize it is needed),

Yeah let's remove them for now. My guess is that these were the start of models that never made it to production.

Change 867601 merged by Elukey:

[operations/deployment-charts@master] ml-services: remove translatewiki and frwikisource isvcs

https://gerrit.wikimedia.org/r/867601

@achou there is one problem with the model rename to "itemquality":

root@deploy1002:/srv/deployment-charts/helmfile.d/ml-services/revscoring-articlequality# kubectl logs wikidatawiki-itemquality-predictor-default-mjjp2-deploymen27wmg -n revscoring-articlequality storage-initializer 
[I 221219 08:37:46 storage-initializer-entrypoint:13] Initializing, args: src_uri [s3://wmf-ml-models/itemquality/wikidatawiki/20220509074653/] dest_path[ [/mnt/models]
[I 221219 08:37:46 storage:54] Copying contents of s3://wmf-ml-models/itemquality/wikidatawiki/20220509074653/ to local
[I 221219 08:37:46 credentials:1111] Found credentials in environment variables.
Traceback (most recent call last):
  File "/usr/bin/storage-initializer-entrypoint", line 14, in <module>
    kserve.Storage.download(src_uri, dest_path)
  File "/usr/local/lib/python3.7/dist-packages/kserve/storage.py", line 74, in download
    Storage._download_s3(uri, out_dir)
  File "/usr/local/lib/python3.7/dist-packages/kserve/storage.py", line 150, in _download_s3
    "Failed to fetch model. No model found in %s." % bucket_path)
RuntimeError: Failed to fetch model. No model found in itemquality/wikidatawiki/20220509074653/.

So I did this:

elukey@stat1004:~$ sudo s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg mv s3://wmf-ml-models/articlequality/wikidatawiki/20220509074653/model.bin s3://wmf-ml-models/itemquality/wikidatawiki/20220509074653/model.bin
move: 's3://wmf-ml-models/articlequality/wikidatawiki/20220509074653/model.bin' -> 's3://wmf-ml-models/itemquality/wikidatawiki/20220509074653/model.bin'

And the storage initializer worked. Lemme know if it is ok or if we should do something different! :)

elukey@stat1004:~$ sudo s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg mv s3://wmf-ml-models/articletopic/wikidatawiki/20220720074925/model.bin s3://wmf-ml-models/itemtopic/wikidatawiki/20220720074925/model.bin
move: 's3://wmf-ml-models/articletopic/wikidatawiki/20220720074925/model.bin' -> 's3://wmf-ml-models/itemtopic/wikidatawiki/20220720074925/model.bin'

The change has been rolled out :)

@elukey Changing model path is good, thanks for your help! :)