Similar to T309623, but this task is for Outlinks topic model (non-revscoring-based model). We should test if we can reduce the latency by using non-blocking HTTP calls in Kserve to get outlinks from an article and their associated Wikidata IDs for predicting article topics. Currently the outlink_transformer calls the MW API with a blocking code.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T272917 Lift Wing proof of concept | |||
Resolved | achou | T287056 Deploy Outlinks topic model to production | |||
Resolved | achou | T311043 Use non-blocking HTTP calls to get outlinks for Outlinks topic model | |||
Resolved | achou | T313493 Add support for async session to python-mwapi |
Event Timeline
Change 807135 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] outlink: use tornado async http client to fetch outlinks
Some test results for model using async http calls:
aikochou@ml-sandbox:~/isvcs/outlink$ wrk -c 1 -t 1 --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.55s 335.17ms 2.75s 66.67% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 2.73s 75% 2.75s 90% 2.75s 99% 2.75s 3 requests in 10.02s, 1.83KB read Requests/sec: 0.30 Transfer/sec: 187.17B aikochou@ml-sandbox:~/isvcs/outlink$ wrk -c 4 -t 2 --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 2 threads and 4 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.82s 641.08ms 4.17s 58.33% Req/Sec 0.75 1.14 3.00 83.33% Latency Distribution 50% 2.85s 75% 3.15s 90% 3.54s 99% 4.17s 12 requests in 10.02s, 7.32KB read Requests/sec: 1.20 Transfer/sec: 748.68B aikochou@ml-sandbox:~/isvcs/outlink$ wrk --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.34s 222.14ms 2.98s 63.16% Req/Sec 5.23 4.73 20.00 48.39% Latency Distribution 50% 2.30s 75% 2.54s 90% 2.67s 99% 2.98s 38 requests in 10.02s, 23.19KB read Requests/sec: 3.79 Transfer/sec: 2.32KB
For model using a blocking mwapi:
aikochou@ml-sandbox:~/isvcs/outlink$ wrk -c 1 -t 1 --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.09s 101.48ms 2.24s 75.00% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 2.07s 75% 2.24s 90% 2.24s 99% 2.24s 4 requests in 10.02s, 2.44KB read Requests/sec: 0.40 Transfer/sec: 249.62B aikochou@ml-sandbox:~/isvcs/outlink$ wrk -c 4 -t 2 --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 2 threads and 4 connections Thread Stats Avg Stdev Max +/- Stdev Latency 9.50s 1.00s 10.00s 75.00% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 10.00s 75% 10.00s 90% 10.00s 99% 10.00s 4 requests in 10.02s, 2.44KB read Requests/sec: 0.40 Transfer/sec: 249.58B aikochou@ml-sandbox:~/isvcs/outlink$ wrk --timeout 10s -s inference.lua http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict --latency Running 10s test @ http://192.168.49.2:30066/v1/models/outlink-topic-model:predict 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 0.00us 0.00us 0.00us -nan% Req/Sec 0.00 0.00 0.00 -nan% Latency Distribution 50% 0.00us 75% 0.00us 90% 0.00us 99% 0.00us 0 requests in 10.02s, 0.00B read Requests/sec: 0.00 Transfer/sec: 0.00B
Change 807135 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] outlink: use async HTTP calls to fetch data
Change 818052 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] outlink: allow accessing MediaWiki API through internal endpoint
Change 818052 merged by Elukey:
[machinelearning/liftwing/inference-services@main] outlink: allow accessing MediaWiki API through internal endpoint
I found out the response time for outlink model highly depends on the input article.
When the queried article is long and has many wikilinks (the feature that the outlink model used to infer article topics), for instance article Toni Morrison, it would probably take multiple continuing queries to get all the wikilinks. (see API:Query#Example_4:_Continuing_queries) Currently MW API returns 50 links each call, determined by the parameter gpllimit, not sure what is the maximum value we can set. The response time for the query in prod is 2.939s.
But when the queried article is shorter or has fewer wikilinks, for instance article Wings of Fire (novel series), the response time for the query in prod is only 0.330s.
If we looked at the logs for the predicator pod, we see the model only takes a few milliseconds to do inference for both cases:
[I 220801 10:32:29 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 4.51ms [I 220801 11:15:21 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 2.85ms
If we looked at the logs for the transformer pod, there was a big difference:
[I 220801 10:32:29 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 2702.58ms [I 220801 11:15:21 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 156.97ms
Performance test results
Test article: Toni Morrison
aikochou@deploy1002:~$ wrk -c 1 -t 1 --timeout 10s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.79s 81.66ms 1.88s 60.00% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 1.77s 75% 1.86s 90% 1.88s 99% 1.88s 5 requests in 10.02s, 3.05KB read Requests/sec: 0.50 Transfer/sec: 311.94B aikochou@deploy1002:~$ wrk -c 3 -t 3 --timeout 10s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 3 threads and 3 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.83s 123.87ms 2.08s 66.67% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 1.82s 75% 1.91s 90% 1.99s 99% 2.08s 15 requests in 10.02s, 9.16KB read Requests/sec: 1.50 Transfer/sec: 0.91KB aikochou@deploy1002:~$ wrk -c 5 -t 5 --timeout 10s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 5 threads and 5 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.77s 113.23ms 1.95s 64.00% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 1.79s 75% 1.83s 90% 1.92s 99% 1.95s 25 requests in 10.02s, 15.26KB read Requests/sec: 2.50 Transfer/sec: 1.52KB aikochou@deploy1002:~$ wrk -c 10 -t 10 --timeout 10s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.78s 200.52ms 2.53s 89.74% Req/Sec 0.00 0.00 0.00 100.00% Latency Distribution 50% 1.75s 75% 1.81s 90% 1.97s 99% 2.53s 39 requests in 10.02s, 23.80KB read Socket errors: connect 2, read 0, write 0, timeout 0 Requests/sec: 3.89 Transfer/sec: 2.38KB
Test article: Wings of Fire (novel series)
aikochou@deploy1002:~$ wrk -c 1 -t 1 --timeout 2s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 112.51ms 35.21ms 244.94ms 85.71% Req/Sec 9.18 3.80 20.00 70.59% Latency Distribution 50% 99.00ms 75% 114.81ms 90% 161.80ms 99% 244.94ms 36 requests in 10.02s, 18.89KB read Socket errors: connect 0, read 0, write 0, timeout 1 Requests/sec: 3.59 Transfer/sec: 1.89KB aikochou@deploy1002:~$ wrk -c 3 -t 3 --timeout 2s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 3 threads and 3 connections Thread Stats Avg Stdev Max +/- Stdev Latency 148.69ms 143.21ms 1.05s 89.95% Req/Sec 9.71 3.19 20.00 84.04% Latency Distribution 50% 96.93ms 75% 114.16ms 90% 299.94ms 99% 755.43ms 198 requests in 10.02s, 103.90KB read Requests/sec: 19.76 Transfer/sec: 10.37KB aikochou@deploy1002:~$ wrk -c 5 -t 5 --timeout 2s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 5 threads and 5 connections Thread Stats Avg Stdev Max +/- Stdev Latency 165.28ms 178.87ms 1.21s 90.00% Req/Sec 9.59 3.14 20.00 84.06% Latency Distribution 50% 97.19ms 75% 121.95ms 90% 358.32ms 99% 956.80ms 336 requests in 10.02s, 176.31KB read Socket errors: connect 0, read 0, write 0, timeout 1 Requests/sec: 33.53 Transfer/sec: 17.59KB aikochou@deploy1002:~$ wrk -c 10 -t 10 --timeout 2s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict --latency Running 10s test @ https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 168.71ms 192.80ms 1.50s 89.47% Req/Sec 9.48 3.08 20.00 84.41% Latency Distribution 50% 99.48ms 75% 121.08ms 90% 380.11ms 99% 1.04s 554 requests in 10.02s, 290.74KB read Socket errors: connect 2, read 0, write 0, timeout 1 Requests/sec: 55.29 Transfer/sec: 29.02KB
Overall, we see improvement on performance using async preprocess(). :)
Currently MW API returns 50 links each call, determined by the parameter gpllimit, not sure what is the maximum value we can set.
The "gpllimit" has a max value of 500, so I changed it to 500 to improve the MW API call performance.
https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/837642
Tested in staging. For the article Toni Morrison, the response time for the MW API call was reduced from 2702.58ms to 490.53ms.
[I 221010 09:25:29 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 490.53ms
That's super nice.
But I also observed some warnings in logs:
[W 221010 09:25:29 async_session:98] - main -- {'warnings': 'HTTP used when HTTPS was expected.\nSubscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.'}