Page MenuHomePhabricator
Paste P57453

article-descriptions: evaluate `preprocess()` and `predict()` runtime on LiftWing
ActivePublic

Authored by kevinbazira on Feb 21 2024, 8:15 AM.
isvc request in debug mode to return latency field
```
$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2, "debug": 1}' -H "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1
{"lang":"en","title":"Clandonald","blp":false,"num_beams":2,"groundtruth":"Hamlet in Alberta, Canada","latency":{"wikidata-info (s)":0.04478883743286133,"total network (s)":0.4124155044555664,"model (s)":2.252171277999878,"total (s)":2.664602756500244},"features":{"descriptions":{"fr":"hameau d'Alberta","en":"hamlet in central Alberta, Canada"},"first-paragraphs":{"en":"Clandonald is a hamlet in central Alberta, Canada within the County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.","fr":"Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."}},"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"]}
real 0m2.699s
user 0m0.005s
sys 0m0.008s
```
formatted JSON response for easier reading
```
{
"lang": "en",
"title": "Clandonald",
"blp": false,
"num_beams": 2,
"groundtruth": "Hamlet in Alberta, Canada",
"latency": {
"wikidata-info (s)": 0.04478883743286133,
"total network (s)": 0.4124155044555664,
"model (s)": 2.252171277999878,
"total (s)": 2.664602756500244
},
"features": {
"descriptions": {
"fr": "hameau d'Alberta",
"en": "hamlet in central Alberta, Canada"
},
"first-paragraphs": {
"en": "Clandonald is a hamlet in central Alberta, Canada within the County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.",
"fr": "Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."
}
},
"prediction": [
"Hamlet in Alberta, Canada",
"human settlement in Alberta, Canada"
]
}
```

Event Timeline

The preprocess() method calculates its runtime using execution_times["total network (s)"] as shown here. Based on the formatted JSON response shown above, the preprocess runtime is about 0.4s. These results match @Isaac's comment in T343123#9527462.

According to T343123#9520331, the bottleneck is the preprocess step. However, the formatted JSON response shown above and the code profiling we did in T353127#9399942 indicate that the bottleneck is the predict step.

After having a chat with @isarantopoulos on IRC, I ran a comparison between the preprocess() and predict() methods. I used the sample inputs in P54507 to determine if there was a discrepancy between LiftWing and a LocalServer running on the ml-sandbox.

Based on the results shown in the table below, the preprocess step on LiftWing takes significantly longer compared to the preprocess step on LocalServer, even when LocalServer is running on fewer compute resources. Although preprocess on LiftWing is slower than preprocess on LocalServer, predict takes much longer time than preprocess on both LiftWing and LocalServer. This makes predict the main bottleneck.

Request PayloadLiftWingLocalServer
preprocess()predict()preprocess()predict()
{"lang": "en", "title": "Clandonald", "num_beams": 3, "debug": 1}0.3936965473.0898108480.1163804534.053587675
{"lang": "fr", "title": "Tour_Eiffel", "num_beams": 3, "debug": 1}5.1649596694.9414339070.3589725496.980854988
{"lang": "it", "title": "Ravenna", "num_beams": 3, "debug": 1}3.5882630353.4942364690.3003511434.699475765
{"lang": "es", "title": "Madrid", "num_beams": 3, "debug": 1}5.5143513685.4628019330.4220983987.945872545
{"lang": "de", "title": "Kölner_Dom", "num_beams": 3, "debug": 1}3.923954013.9725530150.3154754645.495382547
{"lang": "nl", "title": "Delfts_blauw", "num_beams": 3, "debug": 1}1.8955898283.9034583570.189494615.860885143
{"lang": "ja", "title": "伏見稲荷大社", "num_beams": 3, "debug": 1}2.3730537895.1527650360.2676248556.812711
{"lang": "zh", "title": "大熊猫", "num_beams": 3, "debug": 1}4.9397377976.266813040.3436181558.47557354
{"lang": "ko", "title": "K-pop", "num_beams": 3, "debug": 1}4.3054671295.3839468960.2717428217.498938084
{"lang": "vi", "title": "Áo_dài", "num_beams": 3, "debug": 1}2.4022872452.4146857260.2610232833.141326427
{"lang": "ru", "title": "Московский_Кремль", "num_beams": 3, "debug": 1}3.1807923324.2391452790.3166809085.497658491
{"lang": "cs", "title": "Martina_Navrátilová", "num_beams": 3, "debug": 1}4.5238447195.6207675930.34972627.412033796
{"lang": "fi", "title": "Muumilaakson_tarinoita", "num_beams": 3, "debug": 1}1.8902409087.1866555210.5013048659.384730816
{"lang": "lt", "title": "Vilnius", "num_beams": 3, "debug": 1}4.7739903933.5939981940.3524594314.712741375
{"lang": "lv", "title": "Kristaps_Porziņģis", "num_beams": 3, "debug": 1}2.9024071693.3475329880.2820465564.313842058
{"lang": "et", "title": "Naisekandmine", "num_beams": 3, "debug": 1}2.090822224.6512622830.2379198076.104430437
{"lang": "tr", "title": "Lokum", "num_beams": 3, "debug": 1}2.8313338762.3767921920.2551121712.86655283
{"lang": "ro", "title": "Țara_Românească", "num_beams": 3, "debug": 1}2.0772013662.8403482440.2491736414.163508892
{"lang": "kk", "title": "Қазақ_тілі", "num_beams": 3, "debug": 1}3.7416296012.8084831240.3704879283.881841183
{"lang": "gu", "title": "પટોળા", "num_beams": 3, "debug": 1}0.5885002614.917824030.1350481516.921203136
{"lang": "hi", "title": "बुद्ध_(उपाधि)", "num_beams": 3, "debug": 1}3.8189287194.6055824760.3145885475.683538198
{"lang": "si", "title": "නල්ලූර්_කන්දස්වාමි_කෝවල", "num_beams": 3, "debug": 1}0.6498417856.3216617110.2327439797.659542799
{"lang": "my", "title": "သနပ်ခါး", "num_beams": 3, "debug": 1}1.5512280465.6419496540.1891548636.954239845
{"lang": "ne", "title": "दसैँ", "num_beams": 3, "debug": 1}1.861238482.8448948860.178865914.162917376
Average2.9576400124.3783084750.2838372795.861807873

To further understand the latency on LiftWing, I looked at backends and p0.99 in this grafana dashboard: https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?from=now-3h&orgId=1&to=now&var-backend=rest-gateway.discovery.wmnet&var-cluster=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=experimental&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&var-response_code=All

1.article-descriptions-predictor-default-00025-private has the highest latency spike at 13.7s

LiftWing article-descriptions article-descriptions-predictor-default-00025-private latency (Screenshot from 2024-02-21 17-29-13).png (741×1 px, 155 KB)

2.InboundPassthroughClusterIpv4 has the second highest latency spike at 4.8s

LiftWing article-descriptions InboundPassthroughClusterIpv4 latency (Screenshot from 2024-02-21 17-26-00).png (741×1 px, 124 KB)

3.following closely behind is rest-gateway.discovery.wmnet with a spike of 4.65s

LiftWing article-descriptions rest-gateway.discovery.wmnet latency (Screenshot from 2024-02-21 17-25-12).png (741×1 px, 131 KB)

4.lastly, api-ro.discovery.wmnet has the lowest spike at 246ms

LiftWing article-descriptions api-ro.discovery.wmnet latency (Screenshot from 2024-02-21 17-25-36).png (741×1 px, 120 KB)

If we skip article-descriptions-predictor-default-00025-private because when utilizing the GPU it stops being the bottleneck as shown here, then we'll have to look at optimizing InboundPassthroughClusterIpv4 and rest-gateway.discovery.wmnet.

Oh wow good sleuthing. To help me understand, does this sound right: the challenge with preprocess is that it for every language (up to 25) that the article exists in, an API call has to be made to that language edition's page summary REST endpoint to get the first paragraph (enwiki endpoint). And as we can see, this normally is still under half a second because it's a pretty quick API and we do the calls async. But presumably something on LiftWing is preventing the up-to-25 API calls from being made async/simultaneously?

@Isaac, you're spot on! The main difference to note is that when we're running the model server on LiftWing, it accesses the REST endpoint via the Rest Gateway using http://rest-gateway.discovery.wmnet:4111/{lang}.wikipedia.org/v1/page/summary/{title}. However, when we're running the model server locally, it accesses the REST endpoint via the Wikimedia REST API using https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}.

@kevinbazira thanks for explaining -- I was unaware of the Rest Gateway etc. stuff so assumed LiftWing was using the same entrypoints to the APIs. I saw the other ticket where you're working through possibilities. I'll monitor but sounds like you all have it and thanks for digging into this!