Page MenuHomePhabricator

Optimize revertrisk-wikidata inference service to achieve ~500ms latency target
Closed, ResolvedPublic

Description

In task T406179, the ML team deployed the revertrisk-wikidata inference service in LiftWing. Subsequently, in task T409388, the Enterprise team conducted load tests to simulate their traffic and shared the following results:

Run 1
• Duration: ~67.3 mins
• Total Requests: 87,595
• Success: 77,205 (88.14%)
• Failures: 10,390 (11.86%)
• Actual RPS: 21.7
• Requests/hour: 78,109
• Target achievement: 52.07% of 150K/hour
• p90 latency (first 200 successes): ~5.7s

Run 2
• Duration: ~67.1 mins
• Total Requests: 75,885
• Success: 64,292 (84.72%)
• Failures: 11,593 (15.28%)
• Actual RPS: 18.85
• Requests/hour: 67,866
• Target achievement: 45.24% of 150K/hour

The Enterprise team's results indicate that the service has a p90 latency of ~5.7s for the first 200 successes, which is above the ~500ms target.

Below are the options we are going to explore when optimizing the revertrisk-wikidata isvc to meet the latency target:

  • Enable multi-worker processing in KServe
  • Parallelize asynchronous calls to Wikidata API
  • Improve error handling and retry logic for Wikidata API requests
  • Cache Wikidata API responses to reduce redundant calls
  • Enable GPU inference if CPU inference is a bottleneck
  • Adjust deployment configurations to improve resource allocation and k8s autoscaling

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+3 -4
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+0 -1
operations/deployment-chartsmaster+7 -4
machinelearning/liftwing/inference-servicesmain+37 -4
machinelearning/liftwing/inference-servicesmain+54 -33
machinelearning/liftwing/inference-servicesmain+2 -2
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+1 -1
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+52 -10
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+123 -87
machinelearning/liftwing/inference-servicesmain+23 -5
machinelearning/liftwing/inference-servicesmain+7 -4
operations/deployment-chartsmaster+7 -6
operations/deployment-chartsmaster+23 -2
operations/deployment-chartsmaster+5 -5
operations/deployment-chartsmaster+1 -2
operations/deployment-chartsmaster+9 -7
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+7 -5
machinelearning/liftwing/inference-servicesmain+27 -4
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1228426 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: rr-wikidata horizontal scaling

https://gerrit.wikimedia.org/r/1228426

Change #1228527 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: rr-wikidata update replicas

https://gerrit.wikimedia.org/r/1228527

Change #1228527 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: rr-wikidata update replicas

https://gerrit.wikimedia.org/r/1228527

Change #1228548 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: rr-wikidata reduce memory usage

https://gerrit.wikimedia.org/r/1228548

Change #1228548 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: rr-wikidata reduce memory usage

https://gerrit.wikimedia.org/r/1228548

Change #1228997 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: bump rr-wikidata limitranges and resourcequota

https://gerrit.wikimedia.org/r/1228997

Change #1228997 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: bump rr-wikidata limitranges and resourcequota

https://gerrit.wikimedia.org/r/1228997

Change #1229071 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: rr-wikidata horizontal scaling

https://gerrit.wikimedia.org/r/1229071

Change #1229071 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: rr-wikidata horizontal scaling

https://gerrit.wikimedia.org/r/1229071

After increasing the k8s resource budget, we successfully horizontally scaled the rr-wikidata model-server as calculated in T414060#11515839 and T414060#11528208:

kevinbazira@deploy2002:~$ kube_env revertrisk ml-serve-eqiad
kevinbazira@deploy2002:~$ kubectl describe resourcequota -n revertrisk
Name:            quota-compute-resources
Namespace:       revertrisk
Resource         Used      Hard
--------         ----      ----
limits.cpu       140       140
limits.memory    189068Mi  200Gi
requests.cpu     105800m   140
requests.memory  172940Mi  200Gi

$ kube_env revertrisk ml-serve-eqiad 
$ kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-84wmq   3/3     Running   0          112s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-hcvl6   3/3     Running   0          113s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-mftph   3/3     Running   0          112s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-tc2lt   3/3     Running   0          112s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-wpgmq   3/3     Running   0          113s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-wv7dp   3/3     Running   0          114s
revertrisk-wikidata-predictor-00005-deployment-597c58cbb7-zfpdh   3/3     Running   0          113s

$ kube_env revertrisk ml-serve-codfw 
$ kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-fb9lj   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-jbtz7   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-l6f7w   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-lnklf   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-tzhkl   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-vp4m4   3/3     Running   0          5m22s
revertrisk-wikidata-predictor-00005-deployment-7b5d9dbd49-ww6xv   3/3     Running   0          5m22s

Following the vertical scaling implemented in T414060#11515839 and the horizontal scaling implemented in T414060#11536858, the rr-wikidata inference service has been able to serve ~52.5 RPS with a median latency of ~510ms, as detailed in the locust load test results in P86969#353245. This should be able to reliably handle the target of ~42 RPS with a latency of ~500ms.

Change #1236103 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: Cap maxReplicas at 7

https://gerrit.wikimedia.org/r/1236103

In T409388#11560918, the WME team reported encountering the error: {"error":"TypeError : unhashable type: 'dict'"} during their load tests on the rr-wikidata isvc.

I looked into logstash and found this error here: https://logstash.wikimedia.org/goto/94338806f1c0bf1316ac9d18cd3ecbb0. It occurs in process_sentence() function.

I've tried to reproduce the error using the rev_ids values found in the logs, but all requests succeeded without issues:

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1082751557,"output":{"prediction":true,"probabilities":{"true":0.7224097398999795,"false":0.2775902601000205}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:24 GMT
server: uvicorn
content-length: 182
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1346051047,"output":{"prediction":true,"probabilities":{"true":0.7800348182415658,"false":0.21996518175843416}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:25 GMT
server: uvicorn
content-length: 183
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1036870343,"output":{"prediction":false,"probabilities":{"true":0.14990464958001956,"false":0.8500953504199804}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:25 GMT
server: uvicorn
content-length: 184
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1224452472,"output":{"prediction":false,"probabilities":{"true":0.044590421060711594,"false":0.9554095789392885}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:26 GMT
server: uvicorn
content-length: 183
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":2448950487,"output":{"prediction":false,"probabilities":{"true":0.02987688965611249,"false":0.9701231103438875}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:26 GMT
server: uvicorn
content-length: 182
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":2446093759,"output":{"prediction":false,"probabilities":{"true":0.0350715967740667,"false":0.9649284032259333}}}HTTP/1.1 200 OK
date: Tue, 03 Feb 2026 10:20:27 GMT
server: uvicorn
content-length: 182
content-type: application/json

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":988287590,"output":{"prediction":false,"probabilities":{"true":0.34038634859359024,"false":0.6596136514064097}}}

To address this issue, I will push a fix that ensures the process_sentence() function gracefully handles unexpected dictionaries returned by the Wikidata API.

Change #1236255 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: handle unexpected dicts

https://gerrit.wikimedia.org/r/1236255

Change #1236280 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: add retry logic for Wikidata requests

https://gerrit.wikimedia.org/r/1236280

Change #1236255 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: handle unexpected dicts

https://gerrit.wikimedia.org/r/1236255

Change #1236280 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: add retry logic for Wikidata requests

https://gerrit.wikimedia.org/r/1236280

Change #1236618 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: parallelize async calls that fetch metadata features

https://gerrit.wikimedia.org/r/1236618

Change #1236618 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: parallelize async calls that fetch metadata features

https://gerrit.wikimedia.org/r/1236618

Change #1236713 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: add localized cache for Wikidata API responses

https://gerrit.wikimedia.org/r/1236713

Change #1236713 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: add localized cache for Wikidata API responses

https://gerrit.wikimedia.org/r/1236713

Change #1236103 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: Cap maxReplicas at 7

https://gerrit.wikimedia.org/r/1236103

Change #1236723 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1236723

Change #1236723 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1236723

In the logs referenced in T414060#11578047, we identified edge-cases that occur under heavy load. When the rr-wikidata isvc experiences high traffic, it sends a proportional amount of requests to the Wikidata API to retrieve revisions for scoring.

During the WME load tests, the LiftWing logs revealed transient network errors and Wikidata API overload failures under such conditions. These issues caused some requests to fail prematurely, impacting the service's stability during high traffic scenarios.

To address these challenges, we implemented several optimizations to reduce the load on the Wikidata API and improve service resilience:

  • Gracefully handled unexpected dictionary structures returned by the Wikidata API.
  • Introduced retry logic for Wikidata API requests to mitigate transient failures.
  • Parallelized asynchronous calls to fetch metadata features from Wikidata, improving efficiency.
  • Capped maxReplicas at 7 to prevent autoscaling from triggering the KubernetesDeploymentUnavailableReplicas alert.
  • Implemented a localized disk-based cache for Wikidata API responses, shared across workers within the same pod.

As a result of these improvements, the rr-wikidata inference service's internal endpoint shows it can handle ~61.2 RPS with a median latency of ~200ms, as detailed in the locust load test results in P86969#356872.

NOTE: In T409388#11560986, we noticed that WME uses the external endpoint and recommended using the internal endpoint as it's much faster. WME let us know that they use the external endpoint by design (T409388#11561038). All load tests run by the ML team prior to this used the internal endpoint, so moving forward, we have to start using the external endpoint to get numbers that reflect what WME experiences in production.

The request below was returning an error:

$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1644343545}' -i -H "Content-type: application/json"

{"error":"RequestError : no-such-entity: Could not find an entity with the ID \"P00000140635\". -- See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."}

and the model-server logs of extract_entity_ids() showed:

INFO:root:Extracting entity IDs from diffs: ['{"root[\'claims\'][\'P705\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P705\', \'hash\': \'6c518f960fb3d0d27baacb7f04a8c3e6e09d911f\', \'datavalue\': {\'value\': \'ENSDARP00000140635\', \'type\': \'string\'}}, \'type\': \'statement\', \'id\': \'Q29836414$359BB86F-C2CC-4E15-AB6E-36CA8DA367C8\', \'rank\': \'normal\', \'references\': [{\'hash\': \'a9121a6a56b9afbad0fa50428f6765752d824209\', \'snaks\': {\'P248\': [{\'snaktype\': \'value\', \'property\': \'P248\', \'hash\': \'bc7274793ac69d38434ef3646ef5efe9277f54bb\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 111699881, \'id\': \'Q111699881\'}, \'type\': \'wikibase-entityid\'}}], \'P705\': [{\'snaktype\': \'value\', \'property\': \'P705\', \'hash\': \'6c518f960fb3d0d27baacb7f04a8c3e6e09d911f\', \'datavalue\': {\'value\': \'ENSDARP00000140635\', \'type\': \'string\'}}]}, \'snaks-order\': [\'P248\', \'P705\']}]}]}', '{}', '{}']

INFO:root:Extracted entity IDs: ['P248', 'P705', 'P00000140635', 'Q29836414', 'Q111699881']

P00000140635 was getting wrongly extracted out of ENSDARP00000140635, leading to invalid entity IDs being sent to the Wikidata API. This function has been updated to only extract valid standalone Q or P IDs, such as P705, Q29836414, and Q111699881, while ignoring invalid substrings like P00000140635.

Now the same request succeeds as shown below:

$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1644343545}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1644343545,"output":{"prediction":false,"probabilities":{"true":0.10099131460507527,"false":0.8990086853949247}}}

and the model-server logs of extract_entity_ids() show:

INFO:root:Extracting entity IDs from diffs: ['{"root[\'claims\'][\'P705\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P705\', \'hash\': \'6c518f960fb3d0d27baacb7f04a8c3e6e09d911f\', \'datavalue\': {\'value\': \'ENSDARP00000140635\', \'type\': \'string\'}}, \'type\': \'statement\', \'id\': \'Q29836414$359BB86F-C2CC-4E15-AB6E-36CA8DA367C8\', \'rank\': \'normal\', \'references\': [{\'hash\': \'a9121a6a56b9afbad0fa50428f6765752d824209\', \'snaks\': {\'P248\': [{\'snaktype\': \'value\', \'property\': \'P248\', \'hash\': \'bc7274793ac69d38434ef3646ef5efe9277f54bb\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 111699881, \'id\': \'Q111699881\'}, \'type\': \'wikibase-entityid\'}}], \'P705\': [{\'snaktype\': \'value\', \'property\': \'P705\', \'hash\': \'6c518f960fb3d0d27baacb7f04a8c3e6e09d911f\', \'datavalue\': {\'value\': \'ENSDARP00000140635\', \'type\': \'string\'}}]}, \'snaks-order\': [\'P248\', \'P705\']}]}]}', '{}', '{}']

INFO:root:Extracted entity IDs: ['P248', 'P705', 'Q29836414', 'Q111699881']

Change #1237891 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: fix greedy entity ID extraction

https://gerrit.wikimedia.org/r/1237891

Change #1237891 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: fix greedy entity ID extraction

https://gerrit.wikimedia.org/r/1237891

Change #1238309 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image to fix greedy entity ID extraction

https://gerrit.wikimedia.org/r/1238309

Change #1238309 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image to fix greedy entity ID extraction

https://gerrit.wikimedia.org/r/1238309

The rr-wikidata isvc has been throwing errors similar to what we experienced in T414060#11596249. Below are three examples where it wrongly extracts entity IDs from revision diffs and then fails to find them via the Wikidata API, resulting in errors:

request one errors

# response errors
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 2035190839}' -i -H "Content-type: application/json"

{"error":"RequestError : no-such-entity: Could not find an entity with the ID \"P01\". -- See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."}kevinbazira@ml-testing:~/revert-risk-wikidata/fix_error$ 

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{"root[\'descriptions\'][\'en\']": {\'language\': \'en\', \'value\': \'protein found in Plasmodium vivax P01, encoded by pdhB\'}}', '{}', '{}']

INFO:root:Extracted entity IDs: ['P01']
request two errors

# response errors
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1264248356}' -i -H "Content-type: application/json"

{"error":"RequestError : no-such-entity: Could not find an entity with the ID \"P01\". -- See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."}

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{"root[\'labels\'][\'nl\']": {\'language\': \'nl\', \'value\': \'P01.078 Glioma surgery in the elderly, a retrospective population based registry study\'}, "root[\'descriptions\']": \'wetenschappelijk artikel\'}', '{}', '{}']

INFO:root:Extracted entity IDs: ['P01']
request three errors

# response errors
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 914071828}' -i -H "Content-type: application/json"

{"error":"RequestError : no-such-entity: Could not find an entity with the ID \"Q08285\". -- See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations an

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{}', '{"root[\'claims\'][\'P680\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'
P680\', \'hash\': \'64ff0e381c674f4b7d8920c50df452aa6f4db827\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 14330639, \'id\': \'Q14330639\'
}, \'type\': \'wikibase-entityid\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snaktype\': \'value\', \'property\': \'P459\', \'hash\': \'2d4d282f8f2d1
7ef75f0ec508e0e431e8ebb43a2\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 23175558, \'id\': \'Q23175558\'}, \'type\': \'wikibase-entityid\
'}}], \'P352\': [{\'snaktype\': \'value\', \'property\': \'P352\', \'hash\': \'3792b533001e8e2d11711abd1952d24d4a89391f\', \'datavalue\': {\'value\': \'Q08285\', \'ty
pe\': \'string\'}}]}, \'qualifiers-order\': [\'P459\', \'P352\'], \'id\': \'Q60002571$88738E1A-1D20-4AAF-81BF-A5DA24BFDD37\', \'rank\': \'normal\', \'references\': [{
\'hash\': \'00d7c146b3ea85b54e5338539bb2fd379fe9a064\', \'snaks\': {\'P854\': [{\'snaktype\': \'value\', \'property\': \'P854\', \'hash\': \'109a8289fdd5e3de685b19fce
0f17b00013c1203\', \'datavalue\': {\'value\': \'https://github.com/geneontology/go-site/blob/master/metadata/gorefs/goref-0000024.md\', \'type\': \'string\'}}], \'P16
40\': [{\'snaktype\': \'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29affc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\
', \'numeric-id\': 5531047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'e23712
a6463c76e5c35d5fa730abb894d20f8b40\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-01T00:00:00Z\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11
, \'calendarmodel\': \'http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-order\': [\'P854\', \'P1640\', \'P813\']}]}], "root[\'claims\'][\'P
682\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P682\', \'hash\': \'f9cfce23510b4c24641c89ae62b12931ced85fcd\', \'datavalue\': {\'value\': {\'entit
y-type\': \'item\', \'numeric-id\': 14877433, \'id\': \'Q14877433\'}, \'type\': \'wikibase-entityid\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snakt
ype\': \'value\', \'property\': \'P459\', \'hash\': \'2d4d282f8f2d17ef75f0ec508e0e431e8ebb43a2\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\
': 23175558, \'id\': \'Q23175558\'}, \'type\': \'wikibase-entityid\'}}], \'P352\': [{\'snaktype\': \'value\', \'property\': \'P352\', \'hash\': \'3792b533001e8e2d1171
1abd1952d24d4a89391f\', \'datavalue\': {\'value\': \'Q08285\', \'type\': \'string\'}}]}, \'qualifiers-order\': [\'P459\', \'P352\'], \'id\': \'Q60002571$F68CC353-CFC3
-4E4D-82E8-E148151129C4\', \'rank\': \'normal\', \'references\': [{\'hash\': \'00d7c146b3ea85b54e5338539bb2fd379fe9a064\', \'snaks\': {\'P854\': [{\'snaktype\': \'val
ue\', \'property\': \'P854\', \'hash\': \'109a8289fdd5e3de685b19fce0f17b00013c1203\', \'datavalue\': {\'value\': \'https://github.com/geneontology/go-site/blob/master
/metadata/gorefs/goref-0000024.md\', \'type\': \'string\'}}], \'P1640\': [{\'snaktype\': \'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29a
ffc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 5531047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{
\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'e23712a6463c76e5c35d5fa730abb894d20f8b40\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-01T00:00:00Z
\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11, \'calendarmodel\': \'http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-
order\': [\'P854\', \'P1640\', \'P813\']}]}], "root[\'claims\'][\'P681\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P681\', \'hash\': \'6aa8fca576e2
517624d7a09a1e2123b0e354523f\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 21112783, \'id\': \'Q21112783\'}, \'type\': \'wikibase-entityid
\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snaktype\': \'value\', \'property\': \'P459\', \'hash\': \'a2f4f86ecbc767171e95e65d4342f4e79833aaa4\', \
'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 23190637, \'id\': \'Q23190637\'}, \'type\': \'wikibase-entityid\'}}], \'P3382\': [{\'snaktype\':
 \'value\', \'property\': \'P3382\', \'hash\': \'121461aa09543dc8e1635d5c84d09c313a385424\', \'datavalue\': {\'value\': \'PF3D7_1307000\', \'type\': \'string\'}}]}, \
'qualifiers-order\': [\'P459\', \'P3382\'], \'id\': \'Q60002571$5B93DB19-C849-4694-9472-D5EB6DDC1A4E\', \'rank\': \'normal\', \'references\': [{\'hash\': \'cdcab5eb65
fde69991f60db3b2fb3cfaaa09fde7\', \'snaks\': {\'P248\': [{\'snaktype\': \'value\', \'property\': \'P248\', \'hash\': \'7ea3ab52ae938fe645e346c2c8ab3bf615c66856\', \'d
atavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 59729704, \'id\': \'Q59729704\'}, \'type\': \'wikibase-entityid\'}}], \'P1640\': [{\'snaktype\': \
'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29affc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 5531
047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'444d2918a419ea9950bc501f3ffdf
e4908015d0e\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-08T00:00:00Z\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11, \'calendarmodel\': \'
http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-order\': [\'P248\', \'P1640\', \'P813\']}]}]}', '{"root[\'labels\'][\'en\'][\'value\']": {
\'new_value\': \'PKNH_1407300.1\', \'old_value\': \'exosome complex component RRP40, putative\'}, "root[\'aliases\'][\'en\'][0][\'value\']": {\'new_value\': \'exosome
 complex component RRP40, putative\', \'old_value\': \'PK13_0620c\'}}']

INFO:root:Extracted entity IDs: ['Q08285', 'Q23175558', 'Q14330639', 'Q60002571', 'P813', 'Q14877433', 'P854', 'P3382', 'P681', 'Q1985727', '
Q59729704', 'P680', 'Q21112783', 'P459', 'P352', 'Q5531047', 'P248', 'Q23190637', 'P682', 'P1640']

After fixing the extract_entity_ids() function to extract Q or P followed by a non-zero digit then any digits, the requests to the model-server return valid predictions instead of errors, and the logs show that no invalid IDs are extracted:

request one succeeds

# response succeeds
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 2035190839}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":2035190839,"output":{"prediction":false,"probabilities":{"true":0.04359585513078435,"false":0.9564041448692157}}}

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{"root[\'descriptions\'][\'en\']": {\'language\': \'en\', \'value\': \'protein found in Plasmodium vivax P01, encoded by pdhB\'}}', '{}', '{}']

INFO:root:Extracted entity IDs: []
request two succeeds

# response succeeds
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1264248356}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1264248356,"output":{"prediction":false,"probabilities":{"true":0.08971338632580879,"false":0.9102866136741912}}}

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{"root[\'labels\'][\'nl\']": {\'language\': \'nl\', \'value\': \'P01.078 Glioma surgery in the elderly, a retrospective population based registry study\'}, "root[\'descriptions\']": \'wetenschappelijk artikel\'}', '{}', '{}']

INFO:root:Extracted entity IDs: []
request three succeeds

# response succeeds
$ curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 914071828}' -i -H "Content-type: application/json"application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":914071828,"output":{"prediction":true,"probabilities":{"true":0.8709923490648386,"false":0.12900765093516142}}}

# entity ID extraction logs
INFO:root:Extracting entity IDs from diffs: ['{}', '{"root[\'claims\'][\'P680\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P680\', \'hash\': \'64ff0e381c674f4b7d8920c50df452aa6f4db827\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 14330639, \'id\': \'Q14330639\'}, \'type\': \'wikibase-entityid\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snaktype\': \'value\', \'property\': \'P459\', \'hash\': \'2d4d282f8f2d17ef75f0ec508e0e431e8ebb43a2\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 23175558, \'id\': \'Q23175558\'}, \'type\': \'wikibase-entityid\'}}], \'P352\': [{\'snaktype\': \'value\', \'property\': \'P352\', \'hash\': \'3792b533001e8e2d11711abd1952d24d4a89391f\', \'datavalue\': {\'value\': \'Q08285\', \'type\': \'string\'}}]}, \'qualifiers-order\': [\'P459\', \'P352\'], \'id\': \'Q60002571$88738E1A-1D20-4AAF-81BF-A5DA24BFDD37\', \'rank\': \'normal\', \'references\': [{\'hash\': \'00d7c146b3ea85b54e5338539bb2fd379fe9a064\', \'snaks\': {\'P854\': [{\'snaktype\': \'value\', \'property\': \'P854\', \'hash\': \'109a8289fdd5e3de685b19fce0f17b00013c1203\', \'datavalue\': {\'value\': \'https://github.com/geneontology/go-site/blob/master/metadata/gorefs/goref-0000024.md\', \'type\': \'string\'}}], \'P1640\': [{\'snaktype\': \'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29affc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 5531047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'e23712a6463c76e5c35d5fa730abb894d20f8b40\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-01T00:00:00Z\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11, \'calendarmodel\': \'http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-order\': [\'P854\', \'P1640\', \'P813\']}]}], "root[\'claims\'][\'P682\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P682\', \'hash\': \'f9cfce23510b4c24641c89ae62b12931ced85fcd\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 14877433, \'id\': \'Q14877433\'}, \'type\': \'wikibase-entityid\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snaktype\': \'value\', \'property\': \'P459\', \'hash\': \'2d4d282f8f2d17ef75f0ec508e0e431e8ebb43a2\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 23175558, \'id\': \'Q23175558\'}, \'type\': \'wikibase-entityid\'}}], \'P352\': [{\'snaktype\': \'value\', \'property\': \'P352\', \'hash\': \'3792b533001e8e2d11711abd1952d24d4a89391f\', \'datavalue\': {\'value\': \'Q08285\', \'type\': \'string\'}}]}, \'qualifiers-order\': [\'P459\', \'P352\'], \'id\': \'Q60002571$F68CC353-CFC3-4E4D-82E8-E148151129C4\', \'rank\': \'normal\', \'references\': [{\'hash\': \'00d7c146b3ea85b54e5338539bb2fd379fe9a064\', \'snaks\': {\'P854\': [{\'snaktype\': \'value\', \'property\': \'P854\', \'hash\': \'109a8289fdd5e3de685b19fce0f17b00013c1203\', \'datavalue\': {\'value\': \'https://github.com/geneontology/go-site/blob/master/metadata/gorefs/goref-0000024.md\', \'type\': \'string\'}}], \'P1640\': [{\'snaktype\': \'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29affc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 5531047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'e23712a6463c76e5c35d5fa730abb894d20f8b40\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-01T00:00:00Z\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11, \'calendarmodel\': \'http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-order\': [\'P854\', \'P1640\', \'P813\']}]}], "root[\'claims\'][\'P681\']": [{\'mainsnak\': {\'snaktype\': \'value\', \'property\': \'P681\', \'hash\': \'6aa8fca576e2517624d7a09a1e2123b0e354523f\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 21112783, \'id\': \'Q21112783\'}, \'type\': \'wikibase-entityid\'}}, \'type\': \'statement\', \'qualifiers\': {\'P459\': [{\'snaktype\': \'value\', \'property\': \'P459\', \'hash\': \'a2f4f86ecbc767171e95e65d4342f4e79833aaa4\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 23190637, \'id\': \'Q23190637\'}, \'type\': \'wikibase-entityid\'}}], \'P3382\': [{\'snaktype\': \'value\', \'property\': \'P3382\', \'hash\': \'121461aa09543dc8e1635d5c84d09c313a385424\', \'datavalue\': {\'value\': \'PF3D7_1307000\', \'type\': \'string\'}}]}, \'qualifiers-order\': [\'P459\', \'P3382\'], \'id\': \'Q60002571$5B93DB19-C849-4694-9472-D5EB6DDC1A4E\', \'rank\': \'normal\', \'references\': [{\'hash\': \'cdcab5eb65fde69991f60db3b2fb3cfaaa09fde7\', \'snaks\': {\'P248\': [{\'snaktype\': \'value\', \'property\': \'P248\', \'hash\': \'7ea3ab52ae938fe645e346c2c8ab3bf615c66856\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 59729704, \'id\': \'Q59729704\'}, \'type\': \'wikibase-entityid\'}}], \'P1640\': [{\'snaktype\': \'value\', \'property\': \'P1640\', \'hash\': \'661b7a4fcdc7f9d9af6683962f29affc688055c1\', \'datavalue\': {\'value\': {\'entity-type\': \'item\', \'numeric-id\': 5531047, \'id\': \'Q5531047\'}, \'type\': \'wikibase-entityid\'}}], \'P813\': [{\'snaktype\': \'value\', \'property\': \'P813\', \'hash\': \'444d2918a419ea9950bc501f3ffdfe4908015d0e\', \'datavalue\': {\'value\': {\'time\': \'+2019-03-08T00:00:00Z\', \'timezone\': 0, \'before\': 0, \'after\': 0, \'precision\': 11, \'calendarmodel\': \'http://www.wikidata.org/entity/Q1985727\'}, \'type\': \'time\'}}]}, \'snaks-order\': [\'P248\', \'P1640\', \'P813\']}]}]}', '{"root[\'labels\'][\'en\'][\'value\']": {\'new_value\': \'PKNH_1407300.1\', \'old_value\': \'exosome complex component RRP40, putative\'}, "root[\'aliases\'][\'en\'][0][\'value\']": {\'new_value\': \'exosome complex component RRP40, putative\', \'old_value\': \'PK13_0620c\'}}']

INFO:root:Extracted entity IDs: ['Q14330639', 'P3382', 'Q59729704', 'P681', 'P459', 'Q21112783', 'Q1985727', 'Q23190637', 'P248', 'Q14877433', 'Q60002571', 'P1640', 'P682', 'P680', 'Q23175558', 'P813', 'Q5531047', 'P854', 'P352']

Change #1238609 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: fix broad entity ID extraction

https://gerrit.wikimedia.org/r/1238609

Change #1238857 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: parallelize async calls that fetch labels

https://gerrit.wikimedia.org/r/1238857

Change #1238609 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: fix broad entity ID extraction

https://gerrit.wikimedia.org/r/1238609

Change #1238857 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: parallelize async calls that fetch labels

https://gerrit.wikimedia.org/r/1238857

Change #1239279 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: enable GPU inference for mBERT model

https://gerrit.wikimedia.org/r/1239279

Change #1239279 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: enable GPU inference for mBERT model

https://gerrit.wikimedia.org/r/1239279

Change #1239323 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1239323

Change #1239323 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1239323

Change #1239561 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: remove CPU-only PyTorch dependency in favor of GPU-compatible version

https://gerrit.wikimedia.org/r/1239561

Change #1239561 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-wikidata: remove CPU-only PyTorch dependency in favor of GPU-compatible version

https://gerrit.wikimedia.org/r/1239561

Change #1239630 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1239630

Change #1239630 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update rr-wikidata prod image

https://gerrit.wikimedia.org/r/1239630

Change #1239685 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: bump up rr-wikidata workers

https://gerrit.wikimedia.org/r/1239685

Change #1239685 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: bump up rr-wikidata workers

https://gerrit.wikimedia.org/r/1239685

Change #1239721 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: scale down rr-wikidata pod memory

https://gerrit.wikimedia.org/r/1239721

Change #1239721 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: scale down rr-wikidata pod memory

https://gerrit.wikimedia.org/r/1239721

Change #1239738 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: reduce rr-wikidata memory to comply with LimitRange

https://gerrit.wikimedia.org/r/1239738

Change #1239738 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: reduce rr-wikidata memory to comply with LimitRange

https://gerrit.wikimedia.org/r/1239738

Change #1239880 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: reduce rr-wikidata workers to fix resource contention

https://gerrit.wikimedia.org/r/1239880

Change #1239880 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: reduce rr-wikidata workers to fix resource contention

https://gerrit.wikimedia.org/r/1239880

In T414060#11584141, the rr-wikidata isvc was deployed with 7 min/max replicas, with each pod having: 8 workers, 8 CPUs, 16Gi memory, and localized cache. WME's load tests (T409388#11586034) reported ~0.5s median latency with ~5.99% failures.

We requested WME's in-house Go load test tool so that we can reproduce the failed requests locally using our 100K requests/hour token. Three main issues were identified as detailed below:

Error 1:

{"error":"RequestError : no-such-entity: Could not find an entity with the ID \"Q08285\". -- See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."}

This was thrown when the regex used for extracting entity IDs from revision diffs was greedy, occasionally capturing incorrect IDs. It was fixed by changing the regex to be more specific, as shown in T414060#11596249 and T414060#11605878.

Error 2: EOF
This was thrown when a rev_id had a large diff (e.g rev_id:1417906975), which caused the text classification model's calculation to take long on CPU. It was fixed by loading the model on GPU as shown below:

# original performance using CPU only
$ time curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1417906975}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1417906975,"output":{"prediction":false,"probabilities":{"true":0.10105650099442724,"false":0.8989434990055728}}}
real	0m6.647s
user	0m0.015s
sys	0m0.010s

# performance after adding asyncio.gather to fetch_labels_from_api in utils.py
$ time curl -s localhost:8383/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1417906975}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1417906975,"output":{"prediction":false,"probabilities":{"true":0.10105650099442724,"false":0.8989434990055728}}}
real	0m5.758s
user	0m0.023s
sys	0m0.007s

# performance after using GPU since the mBERT score calculation was the bottleneck 
$ time curl -s localhost:8080/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1417906975}' -i -H "Content-type: application/json"

{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1417906975,"output":{"prediction":false,"probabilities":{"true":0.10105650099442724,"false":0.8989434990055728}}}
real	0m1.871s
user	0m0.006s
sys	0m0.007s

Error 3: 502, 500, 503:

These errors were thrown when the model-server was under load. They were not bugs and can be fixed by adding retries on the client side (i.e WME tool). Re-issuing requests for the previously failed rev_ids now succeeds with good performance, as shown below:

# rev_id:1234035494 previously threw 502 error: `read tcp 127.0.0.1:56430->127.0.0.1:8080: read: connection reset by peer` and now succeeds:
$ time curl "https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-wikidata:predict" -X POST -d '{"rev_id": 1234035494}' -H "Content-Type: application/json" --http1.1
{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1234035494,"output":{"prediction":false,"probabilities":{"true":0.22111562272320381,"false":0.7788843772767962}}}
real	0m0.378s
user	0m0.016s
sys	0m0.005s

# rev_id:1221583645 previously threw 500 error: `Post "https://inference.svc.eqiad.wmnet:30443/v1/models/revertrisk-wikidata:predict": context deadline exceeded` and now succeeds:
$ time curl "https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-wikidata:predict" -X POST -d '{"rev_id": 1221583645}' -H "Content-Type: application/json" --http1.1
{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1221583645,"output":{"prediction":false,"probabilities":{"true":0.4331480822672278,"false":0.5668519177327722}}}
real	0m0.496s
user	0m0.013s
sys	0m0.007s

# rev_id:1377521403 previously threw 503 error: `{"httpCode":503,"httpReason":"upstream connect error or disconnect/reset before headers. reset reason: connection timeout"` and now succeeds:
$ time curl "https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-wikidata:predict" -X POST -d '{"rev_id": 1377521403}' -H "Content-Type: application/json" --http1.1
{"model_name":"revertrisk-wikidata","model_version":"2","revision_id":1377521403,"output":{"prediction":false,"probabilities":{"true":0.3910356577904828,"false":0.6089643422095172}}}
real	0m0.435s
user	0m0.013s
sys	0m0.007s

After implementing the above fixes, the rr-wikidata isvc is now deployed with 1 min/max replica (2 workers, 8 CPUs, 16Gi memory, and localized cache in the pod). Since we have limited GPUs, we can only have 1 replica and 2 workers max (tried 4 workers and they ran into resource contention issues).

Next step is to request WME to run the load tests with their token since ours allows much fewer requests/hour.

Change #1240234 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: scale rr-wikidata to two replicas

https://gerrit.wikimedia.org/r/1240234

Change #1240234 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: scale rr-wikidata to two replicas

https://gerrit.wikimedia.org/r/1240234

Following the 1 GPU replica deployment in T414060#11622802, WME load tests (T409388#11626947) reported that the failure rate dropped from 5.99% to 0.16%.

We analyzed the results file shared by WME, and the remaining errors were: connection reset by peer, context deadline exceeded, and upstream connect error, which are not bugs and can be solved by adding retries to the client-side tool, as previously detailed in the comment above.

Since the 1 GPU replica saturated at ~32RPS (~115K requests/hour) with increased latency, we scaled to 2 replicas (i.e 2 GPUs with 4 workers) to double available capacity. This enabled the service to handle over 150K requests/hour, with a median latency of ~430ms, 66.83% of requests below ~500ms, and 93.65% below ~1s, as reported by WME in T409388#11635483.

For future reference, below is a consolidated report of the optimizations we implemented and their corresponding performance results for the rr-wikidata isvc:

Deployment ConfigurationsRequests/s (Throughput)Median Latencyp90Failure RateTarget Achievement (150K/hour)Report
5min/15max replicas (per pod: 2 CPUs, 4Gi memory, initial deployment)21.7-5.7s11.86%52.07%T409388#11483570
7min/10max replicas (per pod: 8 workers, 8 CPUs, 16Gi memory, optimizations)37.680.5617s1.8293s6.8%90.43%T409388#11560918
7min/7max replicas (per pod: 8 workers, 8 CPUs, 16Gi memory, localized cache, optimizations)30.700.50s2.58s5.99%73.67%T409388#11586034
1min/1max replicas (per pod: 1 GPU, 2 workers, 8 CPUs, 16Gi memory, localized cache, optimizations)31.991.41s2.67s0.16%76.77%T409388#11626947
2min/2max replicas (per pod: 1 GPU, 2 workers, 8 CPUs, 16Gi memory, localized cache, optimizations)41.940.43s0.81s0.2%100.66%T409388#11635483

Closing this task as WME decided to proceed with the current deployment and optimizations (T409388#11664793)