Page MenuHomePhabricator

Test ML model-servers with Benthos
Closed, ResolvedPublic

Description

In T319214 we are evaluating if Benthos could be a good candidate as stream processor for Lift Wing. As side effect, we are finding code bugs related to corner cases that we didn't test/took-in-consideration before now. In this task we are going to collect bugs and tests done, together with their fixes.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+7 -7
machinelearning/liftwing/inference-servicesmain+290 -622
operations/deployment-chartsmaster+3 -3
machinelearning/liftwing/inference-servicesmain+21 -18
operations/deployment-chartsmaster+3 -3
machinelearning/liftwing/inference-servicesmain+29 -13
operations/deployment-chartsmaster+7 -7
machinelearning/liftwing/inference-servicesmain+111 -8
machinelearning/liftwing/inference-servicesmain+107 -102
operations/deployment-chartsmaster+7 -7
machinelearning/liftwing/inference-servicesmain+32 -5
machinelearning/liftwing/inference-servicesmain+28 -5
machinelearning/liftwing/inference-servicesmain+31 -5
operations/deployment-chartsmaster+30 -3
machinelearning/liftwing/inference-servicesmain+4 -5
operations/deployment-chartsmaster+18 -8
machinelearning/liftwing/inference-servicesmain+29 -6
machinelearning/liftwing/inference-servicesmain+40 -26
machinelearning/liftwing/inference-servicesmain+41 -26
machinelearning/liftwing/inference-servicesmain+40 -31
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+45 -28
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+5 -30
operations/deployment-chartsmaster+7 -7
operations/deployment-chartsmaster+12 -0
machinelearning/liftwing/inference-servicesmain+1 -1
machinelearning/liftwing/inference-servicesmain+11 -0
operations/deployment-chartsmaster+6 -6
machinelearning/liftwing/inference-servicesmain+5 -16
machinelearning/liftwing/inference-servicesmain+5 -15
machinelearning/liftwing/inference-servicesmain+31 -17
machinelearning/liftwing/inference-servicesmain+20 -1
operations/deployment-chartsmaster+9 -0
machinelearning/liftwing/inference-servicesmain+3 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 842494 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: set higher circuit breaking limits for EventGate on ml-serve

https://gerrit.wikimedia.org/r/842494

Change 842695 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] articlequality: use the fetch_features shared function

https://gerrit.wikimedia.org/r/842695

Change 842695 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: use the fetch_features shared function

https://gerrit.wikimedia.org/r/842695

Change 842697 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update revscoring Docker images

https://gerrit.wikimedia.org/r/842697

Change 842697 merged by Elukey:

[operations/deployment-charts@master] ml-services: update revscoring Docker images

https://gerrit.wikimedia.org/r/842697

Change 842494 merged by Elukey:

[operations/deployment-charts@master] admin_ng: set higher circuit breaking limits for EventGate on ml-serve

https://gerrit.wikimedia.org/r/842494

Thanks to several fixes the overall codebase seems now working better while serving Benthos requests. Before the patches I observed a lot of HTTP 500 responses due to bugs/corner-cases that we didn't handle in the code.

Now the focus should be more related to performance testing, to see how many requests the production clusters can sustain while sending events to EventGate.

Change 843869 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: improve aiohttp and asyncio performances

https://gerrit.wikimedia.org/r/843869

Change 843869 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: improve aiohttp and asyncio performances

https://gerrit.wikimedia.org/r/843869

Change 843951 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update revscoring-editquality-goodfaith's docker image

https://gerrit.wikimedia.org/r/843951

Change 843951 merged by Elukey:

[operations/deployment-charts@master] ml-services: update revscoring-editquality-goodfaith's docker image

https://gerrit.wikimedia.org/r/843951

Change 844916 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] articlequality: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844916

Change 844917 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] draftquality: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844917

Change 844919 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] topic: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844919

A new series of patches to avoid the creation of a aiohttp http session for every request landing to the model server. The refactor allows to use the native aiohttp's client pool (creating the session while creating the model server).

Change 844923 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: run the score method in a separate thread

https://gerrit.wikimedia.org/r/844923

Change 844916 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844916

Change 844917 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] draftquality: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844917

Change 844919 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] topic: improve aiohttp client session handling

https://gerrit.wikimedia.org/r/844919

I am observing a weird behavior of Lift Wing when using Benthos (listening to revision-create events, and then hitting Lift Wing for a score). Sometimes I see a burst of HTTP 500s returned to Benthos (for various combinations of wiki/rev-id, not predictable), and if I inspect logs I see the following when calling the MW API:

  File "/opt/lib/python/site-packages/mwapi/async_session.py", line 107, in _request
    .format(prefix))
ValueError: Could not decode as JSON:
upstream connect error or disconnect/reset before headers. reset reason: connection termination

Alongside with these 500s, I also see peaks of latency, up to 30s+ for P99. The curious thing is that if I try to hit Lift Wing with Benthos or wrk with the same wiki/rev-id combination (so essentially asking for the same score over and over), I don't see any issue at all.

My current line of investigation is on two fronts:

  1. Try to figure out if there is a specific issue/setting/etc.. between Envoy and the MW api that triggers this behavior. A similar issue with a somehow similar environment is described in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/627322, so it could be a good starting point.
  2. As https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools describes, we should really use a process pool to run our model.score functions in predict (that is our blocking code). I am still wondering if certain rev-id/wiki combinations may lead to stalls due to model.score taking longer to finish (and blocking the asyncio loop consequently).

After reading T247484 and T262490, together with https://github.com/envoyproxy/envoy/issues/14981, it seems to me that there is $something that happens at the connection level on the MW API Envoy proxy side. More specifically:

  • The HTTP response header Connection: close may not be sent before stopping a TCP connection.
  • Envoy may cleanup/manage incoming TCP connections in an aggressive way, causing RST or similar to downstream clients.

The Serviceops' envoy proxies have the following settings (I translated them into Istio's HTTP settings for the Destination Rule for clarity):

idleTimeout: 5s
maxRequestsPerConnection: 1000

After applying them, I see improvements but the connection breakages reports still pop up in Benthos. Adding the following makes them completely go away (Virtual Service HTTP config):

retries:
  attempts: 2
  retryOn: connect-failure,refused-stream,reset

I suspect that the issue may be on the Istio proxy's envoy side, or maybe on both fronts (MW API and Istio sidecar). The above config seems good enough for the moment, so I'll apply it. The idea is to revisit the config once we'll have a new version of Istio.

Change 848344 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: update Istio settings for ml-serve clusters

https://gerrit.wikimedia.org/r/848344

Change 844923 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: run the score method in a separate process

https://gerrit.wikimedia.org/r/844923

Change 848344 merged by Elukey:

[operations/deployment-charts@master] admin_ng: update Istio settings for ml-serve clusters

https://gerrit.wikimedia.org/r/848344

Change 849016 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] editquality: move aiohttp session creation out of __init__

https://gerrit.wikimedia.org/r/849016

Change 849016 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: move aiohttp session creation out of __init__

https://gerrit.wikimedia.org/r/849016

Change 849040 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: add a Istio vs and retry settings on ml-serve for eventgate

https://gerrit.wikimedia.org/r/849040

Change 849040 merged by Elukey:

[operations/deployment-charts@master] admin_ng: add a Istio vs and retry settings on ml-serve for eventgate

https://gerrit.wikimedia.org/r/849040

Change 849062 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] articlequality: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849062

Change 849064 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] draftquality: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849064

Change 849086 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] topic: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849086

Change 849062 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849062

Change 849064 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] draftquality: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849064

Change 849086 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] topic: add process pool and improve aiohttp's client session

https://gerrit.wikimedia.org/r/849086

I had to apply a similar fix for eventgate-main.discovery.wmnet's Istio config since the connection errors popped up as well for it. After a chat with Aiko we also decided to go ahead with the separate asyncio ProcessPool to offload model.score functions (CPU bound, blocking the asyncio main thread). Preliminary tests for enwiki goodfaith show a reduction in latency up to p75.

I am still testing with Benthos, the connection errors are not completely gone, more investigations are needed.

Change 849485 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update revscoring's Docker images

https://gerrit.wikimedia.org/r/849485

Change 849485 merged by Elukey:

[operations/deployment-charts@master] ml-services: update revscoring's Docker images

https://gerrit.wikimedia.org/r/849485

Change 850986 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] Refactor multi-processing code in revscoring model servers

https://gerrit.wikimedia.org/r/850986

Something interesting I found when I ran benthos for outlink isvc.

In the kserve logs, I found sudden spikes similar to what Luca observed in the revscoring models in the outlink transformer, but the difference was that their status code was 400. HTTP 400 occurred 56 times out of a total of 651 requests in 3 mins. Among that 56 requests, most of them take less than 200ms, and only 2 requests take more than 1s. (shown below)

[I 221029 21:54:45 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 90.23ms
[I 221029 21:54:45 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 92.26ms
[I 221029 21:54:45 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 87.34ms
[I 221029 21:54:45 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 117.58ms
[W 221029 21:54:45 web:2243] 400 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 2296.59ms
...
[I 221029 21:55:15 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 75.70ms
[I 221029 21:55:15 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 77.60ms
[I 221029 21:55:15 web:2243] 200 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 104.58ms
[W 221029 21:55:15 web:2243] 400 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 165.42ms
[W 221029 21:55:16 web:2243] 400 POST /v1/models/outlink-topic-model:predict (127.0.0.1) 1036.04ms

I'm not sure what's causing the 400 for these requests as there aren't any other logs and error messages. I'm wondering if I could set tornado to debug mode to see more info.

In the predictor's logs, I don't see any deviation. There were only 595 requests (meaning requests got 400 in the transformer wouldn't proceed to predict), and most of them took less than 40ms.

You can also check the latency metrics in Grafana.

I'm not sure what's causing the 400 for these requests as there aren't any other logs and error messages. I'm wondering if I could set tornado to debug mode to see more info.

Really nice finding! One thing that I noticed is that in the code, IIUC, we just return a meaningful error in the HTTP response, but we don't ever attempt to logging.debug anything when we raise BAD_REQUEST responses. Maybe we could start doing it, finding a way to switch between info and debug logging when needed?

In the predictor's logs, I don't see any deviation. There were only 595 requests (meaning requests got 400 in the transformer wouldn't proceed to predict), and most of them took less than 40ms.

This is great, probably the separate process pool is not needed, too much overhead for this case.

Change 850986 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Refactor multi-processing code in revscoring model servers

https://gerrit.wikimedia.org/r/850986

Change 851093 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update docker images

https://gerrit.wikimedia.org/r/851093

Very interesting - I added more decorators to the code and after testing locally this is the result for enwiki-goodfaith:

[I 221031 16:26:34 decorators:13] Function get_revscoring_extractor_cache took 0.8577 seconds to execute.
[I 221031 16:26:36 decorators:25] Function fetch_features took 1.9728 seconds to execute.
[I 221031 16:26:36 decorators:13] Function run_in_process_pool took 0.2101 seconds to execute.
[I 221031 16:26:36 web:2243] 200 POST /v1/models/enwiki-goodfaith:predict (172.17.0.1) 3044.41ms

I checked with DEBUG logging and fetch_features doesn't call any MW API endpoint (as expected, due to the extractor cache created in get_revscoring_extractor_cache) so all that it must be cpu-bound time. Probably a good candidate for the MP process pool, I'll investigate more!

Change 852129 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] revscoring: Improve MP code and logging

https://gerrit.wikimedia.org/r/852129

Change 852129 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revscoring: Improve MP code and logging

https://gerrit.wikimedia.org/r/852129

Change 851093 merged by Elukey:

[operations/deployment-charts@master] ml-services: update docker images

https://gerrit.wikimedia.org/r/851093

I'll keep reporting improvements to the Istio connection termination issues in https://phabricator.wikimedia.org/T322196, that seems a more specific task.

I have been testing ML code over night for enwiki-goodfaith, and the results were really good - latency stayed consistent across all percentiles (except a little blip of several seconds of latency across the board). I've set the pod with 5 processes in the pool, and the following request/limits:

resources:
  limits:
    cpu: "5"
    memory: 2Gi
  requests:
    cpu: "2"
    memory: 2Gi

It may be a lot for a single pod, and I am not sure how it plays out when multiple pods use the same resources (do we have enough capacity?).

Change 853258 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] Add logging for BAD_REQUEST responses

https://gerrit.wikimedia.org/r/853258

Change 853258 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Add logging for BAD_REQUEST responses

https://gerrit.wikimedia.org/r/853258

Articlequality testing went really great, latencies are stable and acceptable without multi-processing.

Change 853298 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update docker images

https://gerrit.wikimedia.org/r/853298

Change 853298 merged by Elukey:

[operations/deployment-charts@master] ml-services: update docker images

https://gerrit.wikimedia.org/r/853298

articletopic seems to work well without multi-processing (need to test it but I think that drafttopic will follow the same trend).

The model servers (revscoring-related) that make 3 API calls during preprocess (edit and draft quality) have most of the issues with latency, meanwhile the rest work really nicely (even without MP). I am wondering if asyncio.gather plays a role (since it is used only only in the former use case), or if it is due to cpu-bound code with more chances to block I/O calls.

I tested outlink with benthos for around 9 hours the other day (here is the grafana metrics), I observed it returned ~1800 Bad Requests error with "No matching article or the page has no outlinks".

I looked at these pages, some of them don’t exist (deleted), but some of them actually exist and I tried to quey MW API locally, I was able to fetch data for these pages.

I noticed that sometimes contacting MW API might be failed at first few times (return 400) but after retry 1-3 times, it would finally get data. In the benthos config, we set "retry_period: 30s", so it retries every 30s, but I observed after 4 retries failed fetching data, it would stop retring.

I'm wondering if there is some mechanism in the MW API that causes fetching data failed when there is traffic. In revert-risk batch inference test, I've seen similar cases - for some rev_ids, in isvc it failed to fetch data from the MW API, but if I just ran a simple script to create a mw api session to fetch data locally, it'll work. That's very weird.

Change 855963 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] outlink: log response returned from MW API

https://gerrit.wikimedia.org/r/855963

Change 855963 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] outlink: log response returned from MW API

https://gerrit.wikimedia.org/r/855963

Change 855995 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update outlink's model binary and docker image

https://gerrit.wikimedia.org/r/855995

Change 855995 merged by Elukey:

[operations/deployment-charts@master] ml-services: update outlink's model binary and docker image

https://gerrit.wikimedia.org/r/855995

Change 856520 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] Improve fetch_features handling in revscoring model servers

https://gerrit.wikimedia.org/r/856520

I tested outlink with benthos for around 9 hours the other day (here is the grafana metrics), I observed it returned ~1800 Bad Requests error with "No matching article or the page has no outlinks".

My guess is it's related to the Mediawiki API or server load like you suggested but just in case it's article-specific, a few other hypotheses to maybe test. If you have some example page IDs, I'm also happy to take a look and see if I notice any patterns with the articles:

  • Is it possible that these are new articles for which the corresponding pagelinks table / Mediawiki API hasn't been updated yet when the initial edit goes through?
  • Edits to redirects and maybe we're handling redirects via the API inconsistently? Or articles that had some other log action like a page move and therefore where the page ID points to is being updated?

Hi @Isaac the issue has been solved, but the task hasn't been updated. I'm doing it now. Thanks for the heads up.

After adding more logging, we found MW API sometimes returned {'batchcomplete': True} at the end of async_generator object which caused keyError to occur, but actually the model server has fetched data, such as

[E 221111 17:08:31 outlink_transformer:91] KeyError occurs for Battle_of_Pisky_(2022) (en). Reason: 'query'. MW API returns: {'batchcomplete': True}
[D 221111 17:08:31 outlink_transformer:94] en Battle_of_Pisky_(2022) fetched 463 outlinks

We have changed the code so it doesn't raise a 400 HTTPError exception in this case!

We have changed the code so it doesn't raise a 400 HTTPError exception in this case!

Great to hear! And I'd always wondered about batchcomplete and now I understand (T84977#936343).

Very nice tests on draftquality with the process pool:

elukey@wintermute:~/Wikimedia/scratch-dir/model_servers_tests$ wrk -c 5 -t 5 -s inference-draftquality.lua -d 300 --timeout 10s http://localhost:8080/v1/models/enwiki-draftquality:predict --latency
Running 5m test @ http://localhost:8080/v1/models/enwiki-draftquality:predict
  5 threads and 5 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.35s     1.21s    9.98s    56.67%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    8.06s 
     75%    9.68s 
     90%    9.90s 
     99%    9.98s 
  41 requests in 1.38m, 17.14KB read
  Socket errors: connect 0, read 0, write 0, timeout 11
Requests/sec:      0.50
Transfer/sec:     211.95B

elukey@wintermute:~/Wikimedia/scratch-dir/model_servers_tests$ wrk -c 5 -t 5 -s inference-draftquality.lua -d 300 --timeout 10s http://localhost:8080/v1/models/enwiki-draftquality:predict --latency
Running 5m test @ http://localhost:8080/v1/models/enwiki-draftquality:predict
  5 threads and 5 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.79s     1.39s    9.97s    89.36%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    5.57s 
     75%    5.83s 
     90%    8.53s 
     99%    9.97s 
  53 requests in 1.16m, 22.15KB read
  Socket errors: connect 0, read 0, write 0, timeout 6
Requests/sec:      0.76
Transfer/sec:     325.54B

elukey@wintermute:~/Wikimedia/scratch-dir/model_servers_tests$ wrk -c 5 -t 5 -s inference-draftquality.lua -d 300 --timeout 10s http://localhost:8080/v1/models/enwiki-draftquality:predict --latency
Running 5m test @ http://localhost:8080/v1/models/enwiki-draftquality:predict
  5 threads and 5 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.31s   881.05ms   9.93s    84.85%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    4.24s 
     75%    4.66s 
     90%    4.96s 
     99%    9.57s 
  141 requests in 2.29m, 58.93KB read
  Socket errors: connect 0, read 0, write 0, timeout 10
Requests/sec:      1.03
Transfer/sec:     439.19B

First one is without process pool, then 2 processes and then 12 (auto-set for my laptop, core i7 etc..). There is a nice progression, and this seems to prove my point that revscoring-model-servers do a lot of cpu-bound/blocking code.

Change 856520 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Refactor revscoring model servers

https://gerrit.wikimedia.org/r/856520

Change 859461 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: Update docker images to use single model server

https://gerrit.wikimedia.org/r/859461

In this task we worked on several things, that now have separate subtasks:

  • Refactored all the revscoring model servers to reuse code as much as possible, adding also Multi Processing support for the cpu-bound revscoring bits. We are going to work on T323624 to deploy and test the new Docker images (and possibly improve them further if needed!).
  • Found a weird Istio/Envoy corner case, namely sometimes Envoy seems to write to broken sockets causing immediate 503s. This is tracked in T322196
  • Tested extensively Benthos! It works nicely, and it is a good choice for our Stream processor use cases. We'll decide with Data Engineering what's best to use when the time comes (most likely during the ORES deprecation).
  • Tested new model servers (non-revscoring ones) and found corner cases, all having separate subtasks (credits to Aiko for the great work!).

It makes sense to keep working on separate tasks for clarity, closing this one!

Change 859461 merged by Elukey:

[operations/deployment-charts@master] ml-services: Update docker images to use single model server

https://gerrit.wikimedia.org/r/859461