User Details
- User Since
- Aug 3 2019, 6:58 AM (241 w, 2 d)
- Availability
- Available
- IRC Nick
- kevinbazira
- LDAP User
- Kevin Bazira
- MediaWiki User
- KBazira (WMF) [ Global Accounts ]
Yesterday
Running into the error below which is caused by a missing events module. This module is used to generate and send a topic prediction event to EventGate. Turns out this module is in python/events.py and the model-server can't locate it because it is not running like a python module.
Traceback (most recent call last): File "/home/inference-services/outlink-topic-model/model-server/model.py", line 6, in <module> import events ModuleNotFoundError: No module named 'events' make[1]: *** [Makefile:97: run-server] Error 1 make[1]: Leaving directory '/home/inference-services' make: *** [Makefile:76: articletopic-outlink] Error 2
Fri, Mar 15
Currently, the method that loads a model has a hardcoded model path. When we set the path through an environmental variable, the error below is thrown. To resolve this, we need to refactor the model server so that it can accept the model path through an environmental variable, similar to how other model servers operate.
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar. Traceback (most recent call last): File "/home/inference-services/outlink-topic-model/model-server/model.py", line 104, in <module> model = OutlinksTopicModel("outlink-topic-model") File "/home/inference-services/outlink-topic-model/model-server/model.py", line 27, in __init__ self.load() File "/home/inference-services/outlink-topic-model/model-server/model.py", line 45, in load self.model = fasttext.load_model("/mnt/models/model.bin") File "/home/inference-services/my_venv/lib/python3.9/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/home/inference-services/my_venv/lib/python3.9/site-packages/fasttext/FastText.py", line 98, in __init__ self.f.loadModel(model_path) ValueError: /mnt/models/model.bin cannot be opened for loading! make[1]: *** [Makefile:97: run-server] Error 1 make[1]: Leaving directory '/home/inference-services' make: *** [Makefile:76: articletopic-outlink] Error 2
While building the articletopic-outlink model-server locally, the error below is thrown. We encountered a similar error in T357382#9536821 and resolved it by adding the wheel package to the requirements.txt before installing fasttext.
Collecting fasttext==0.9.2 (from -r outlink-topic-model/model-server/requirements.txt (line 30)) Using cached fasttext-0.9.2.tar.gz (68 kB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error
Thu, Mar 14
Thank you for providing details about the logo detection project, @mfossati! The ML team is excited to explore hosting it on LiftWing.
Wed, Mar 13
Mon, Mar 11
@MunizaA, we're happy to hear that the information provided was helpful. For more context, the preprocessing time for each payload was recorded after every request made in both KI v5 and v6 environments. This is why the average value for the Preprocess Runtime (s) column was calculated in the last row of the table.
Fri, Mar 8
I noticed that in KI v6, pydantic data models were added to the BaseRevision class in the knowledge integrity schema. The get_revision method used in RRLA relies on the Revision class, which inherits from BaseRevision. Since RRLA uses this method in the preprocess step, I have compared the runtime of the preprocess step for two model servers: one running KI v5 (commit: 026c11a7b3bdb6bd16ef8826bc23b782e8c4e8c8) and another running KI v6 (commit: c8de64b8766e10223eabed73dad1bb2ac68c6b03). Below are the results that show the runtimes based on sample inputs that we use in RRLA load tests and test envs in P58692:
Request Payload | Preprocess Runtime (s) | |
KI v5 | KI v6 | |
{"lang": "es", "rev_id": 144593484} | 0.1010565758 | 0.1377308369 |
{"lang": "de", "rev_id": 224199451} | 0.09622120857 | 0.1309299469 |
{"lang": "ru", "rev_id": 123744978} | 0.1045227051 | 0.1144728661 |
{"lang": "de", "rev_id": 224285471} | 0.1131651402 | 0.1167194843 |
{"lang": "en", "rev_id": 1096349097} | 0.1421649456 | 0.1646904945 |
{"lang": "pl", "rev_id": 67533865} | 0.1243362427 | 0.1153821945 |
{"lang": "en", "rev_id": 1096728668} | 0.1668889523 | 0.169686079 |
{"lang": "en", "rev_id": 1096851393} | 0.122885704 | 0.1490731239 |
{"lang": "pl", "rev_id": 67538140} | 0.106388092 | 0.1116890907 |
{"lang": "en", "rev_id": 1096609909} | 0.1272881031 | 0.1196594238 |
{"lang": "es", "rev_id": 144616722} | 0.1168558598 | 0.1163015366 |
{"lang": "uk", "rev_id": 36418681} | 0.1185092926 | 0.1388361454 |
{"lang": "ru", "rev_id": 123727072} | 0.1143059731 | 0.141433239 |
{"lang": "en", "rev_id": 1096855066} | 0.1432712078 | 0.1390919685 |
{"lang": "ru", "rev_id": 123758382} | 0.1263678074 | 0.1209347248 |
Average | 0.1216151873 | 0.132442077 |
Thu, Mar 7
Thanks @isarantopoulos, earlier on I was missing the python/kserve subdirectory. After changing:
kserve @ git+https://github.com/kserve/kserve.git@426fe21da0612ea6ef4a116b5114270313e02bbb
to
kserve @ git+https://github.com/kserve/kserve.git@426fe21da0612ea6ef4a116b5114270313e02bbb#egg=kserve&subdirectory=python/kserve
this pre-release commit was able to be installed. I also checked to confirm the recently added fastapi and pydantic:
pip list | grep -E '(fastapi|pydantic)' fastapi 0.108.0 pydantic 2.6.3 pydantic_core 2.16.3
Following the workflow we use to build LiftWing model-servers, which involves installing pip dependencies listed in the requirements.txt file. I added the above pre-release commit to the RRLA requirements.txt file, and when I ran pip install -r requirements.txt, the error below was thrown:
Collecting kserve@ git+https://github.com/kserve/kserve.git@426fe21da0612ea6ef4a116b5114270313e02bbb Cloning https://github.com/kserve/kserve.git (to revision 426fe21da0612ea6ef4a116b5114270313e02bbb) to /tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d Running command git clone -q https://github.com/kserve/kserve.git /tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d Running command git rev-parse -q --verify 'sha^426fe21da0612ea6ef4a116b5114270313e02bbb' Running command git fetch -q https://github.com/kserve/kserve.git 426fe21da0612ea6ef4a116b5114270313e02bbb Running command git checkout -q 426fe21da0612ea6ef4a116b5114270313e02bbb ERROR: Command errored out with exit status 1: command: /home/thevenv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d/setup.py'"'"'; __file__='"'"'/tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-nd559lke cwd: /tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python3.9/tokenize.py", line 392, in open buffer = _builtin_open(filename, 'rb') FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-uwt31r3m/kserve_7e7029202b4b49449c96d8b0f6a3185d/setup.py' ---------------------------------------- WARNING: Discarding git+https://github.com/kserve/kserve.git@426fe21da0612ea6ef4a116b5114270313e02bbb. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. ERROR: Could not find a version that satisfies the requirement kserve (unavailable) ERROR: No matching distribution found for kserve (unavailable)
Wed, Mar 6
@Seddon and @Isaac, the article-descriptions inference service is now live in LiftWing production. It can be accessed through:
1.External endpoint:
curl "https://api.wikimedia.org/service/lw/inference/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}'
2.Internal endpoint:
curl "https://inference.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Host: article-descriptions.article-descriptions.wikimedia.org"
Fri, Mar 1
It looks like both Chris' and WMF's API keys do not currently have access to the latest openAI models. I ended up using an older model (gpt-3.5-turbo) and the search query now returns results as shown below:
Using the WMF openAI account, I created a wikigpt API key. When I used it in the application, the error below was thrown:
ERROR:root:The model `gpt-4` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4. ERROR:app:Exception on /search [POST] Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2525, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1822, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1820, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1796, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "app.py", line 119, in search search_query File "app.py", line 84, in queryWikiGPT search_results = response["choices"][0]["message"]["content"] KeyError: 'message' 172.17.0.1 - - [01/Mar/2024 10:01:00] "POST /search HTTP/1.1" 500 - INFO:werkzeug:172.17.0.1 - - [01/Mar/2024 10:01:00] "POST /search HTTP/1.1" 500 -
I dug into the server logs and found that we are receiving a rate limit error from the openAI API:
ERROR:root:You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors. ERROR:app:Exception on /search [POST] Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2525, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1822, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1820, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1796, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "app.py", line 119, in search search_query File "app.py", line 84, in queryWikiGPT search_results = response["choices"][0]["message"]["content"] KeyError: 'message' 172.17.0.1 - - [01/Mar/2024 09:27:54] "POST /search HTTP/1.1" 500 - INFO:werkzeug:172.17.0.1 - - [01/Mar/2024 09:27:54] "POST /search HTTP/1.1" 500 -
Thu, Feb 29
Based on this InfServiceHighMemoryUsage filter the alerts are triggered by both the article-descriptions-predictor-default-00006-deployment-79ffz6hsite pod in codfw and the article-descriptions-predictor-default-00005-deployment-64gglffsite pod in eqiad.
Wed, Feb 28
The article-descriptions model server was firing InfServiceHighMemoryUsage alerts. This usually happens when an isvc uses >90% of its limit for 5mins. I have increased the memory limit used by this model server from 4Gi to 5Gi so that prod can handle processing more isvc requests without running out of memory.
Tue, Feb 27
@klausman helped increase the caps on this model server's resource constraints. I pushed a patch that increased the number of CPUs used by the article-descriptions model server from 6 to 16 so that prod can match staging performance. The previous request we tested in T358467#9579190 has dropped from >8s to <3s:
$ time curl "https://inference.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2, "debug": 1}' -H "Host: article-descriptions.article-descriptions.wikimedia.org" {"lang":"en","title":"Clandonald","blp":false,"num_beams":2,"groundtruth":"Hamlet in Alberta, Canada","latency":{"wikidata-info (s)":0.043555498123168945,"mwapi - first paragraphs (s)":0.22956347465515137,"total network (s)":0.2606837749481201,"model (s)":2.4616105556488037,"total (s)":2.7223129272460938},"features":{"descriptions":{"fr":"hameau d'Alberta","en":"hamlet in central Alberta, Canada"},"first-paragraphs":{"en":"Clandonald is a hamlet in central Alberta, Canada within the County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.","fr":"Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."}},"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"]} real 0m2.744s user 0m0.000s sys 0m0.013s
Thanks @klausman. As discussed yesterday, with the current configuration, a request that was taking <3s on staging is now >8s in prod as shown below:
$ time curl "https://inference.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2, "debug": 1}' -H "Host: article-descriptions.article-descriptions.wikimedia.org" -H "Content-Type: application/json" --http1.1 {"lang":"en","title":"Clandonald","blp":false,"num_beams":2,"groundtruth":"Hamlet in Alberta, Canada","latency":{"wikidata-info (s)":0.07287430763244629,"mwapi - first paragraphs (s)":0.27934741973876953,"total network (s)":0.3130209445953369,"model (s)":7.659532070159912,"total (s)":7.9725823402404785},"features":{"descriptions":{"fr":"hameau d'Alberta","en":"hamlet in central Alberta, Canada"},"first-paragraphs":{"en":"Clandonald is a hamlet in central Alberta, Canada within the County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.","fr":"Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."}},"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"]} real 0m8.049s user 0m0.014s sys 0m0.001s
Mon, Feb 26
After @klausman helped add secrets, deploy configs, and certs we are now getting this error:
$ helmfile -e ml-serve-eqiad diff skipping missing values file matching "values-ml-serve-eqiad.yaml" skipping missing values file matching "values-main.yaml" Comparing release=service-secrets, chart=wmf-stable/secrets Comparing release=main, chart=wmf-stable/kserve-inference in ./helmfile.yaml: 2 errors: err 0: command "/usr/bin/helm3" exited with non-zero status:
Before deploying the article-descriptions model server in prod, I tried running helmfile -e ml-serve-* diff for both *eqiad and *codfw and got the error below:
$ helmfile -e ml-serve-eqiad diff skipping missing values file matching "/etc/helmfile-defaults/private/ml-serve_services/article-descriptions/ml-serve-eqiad.yaml" skipping missing values file matching "/etc/helmfile-defaults/private/ml-serve_services/article-descriptions/ml-serve-eqiad.yaml" skipping missing values file matching "values-ml-serve-eqiad.yaml" skipping missing values file matching "values-main.yaml" Comparing release=service-secrets, chart=wmf-stable/secrets Comparing release=main, chart=wmf-stable/kserve-inference in ./helmfile.yaml: 2 errors: err 0: command "/usr/bin/helm3" exited with non-zero status:
Fri, Feb 23
Support for building the readability model-server using the Makefile was added and it can be tested using:
# first terminal $ make readability # second terminal $ curl localhost:8080/v1/models/readability:predict -X POST -d '{"rev_id": 123456, "lang": "en"}' -H "Content-type: application/json" $ MODEL_TYPE=readability make clean
Thu, Feb 22
@Isaac, you're spot on! The main difference to note is that when we're running the model server on LiftWing, it accesses the REST endpoint via the Rest Gateway using http://rest-gateway.discovery.wmnet:4111/{lang}.wikipedia.org/v1/page/summary/{title}. However, when we're running the model server locally, it accesses the REST endpoint via the Wikimedia REST API using https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}.
I have run the second API call using the same LocalServer on the ml-sanbox (as used in P57453#232415). Below are the results:
$ time curl https://es.wikipedia.org/api/rest_v1/page/summary/Madrid {"type":"standard","title":"Madrid","displaytitle":"<span class=\"mw-page-title-main\">Madrid</span>","namespace":{"id":0,"text":""},"wikibase_item":"Q2807","titles":{"canonical":"Madrid","normalized":"Madrid","display":"<span class=\"mw-page-title-main\">Madrid</span>"},"pageid":1791,"thumbnail":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Bandera_de_la_ciudad_de_Madrid.svg/langes-320px-Bandera_de_la_ciudad_de_Madrid.svg.png","width":320,"height":213},"originalimage":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Bandera_de_la_ciudad_de_Madrid.svg/langes-1500px-Bandera_de_la_ciudad_de_Madrid.svg.png","width":1500,"height":1000},"lang":"es","dir":"ltr","revision":"158343028","tid":"8ade1a10-d0ba-11ee-b43f-d31ec09a3bed","timestamp":"2024-02-21T13:09:02Z","description":"capital y municipio más poblado de España","description_source":"central","coordinates":{"lat":40.41694444,"lon":-3.70333333},"content_urls":{"desktop":{"page":"https://es.wikipedia.org/wiki/Madrid","revisions":"https://es.wikipedia.org/wiki/Madrid?action=history","edit":"https://es.wikipedia.org/wiki/Madrid?action=edit","talk":"https://es.wikipedia.org/wiki/Discusi%C3%B3n:Madrid"},"mobile":{"page":"https://es.m.wikipedia.org/wiki/Madrid","revisions":"https://es.m.wikipedia.org/wiki/Special:History/Madrid","edit":"https://es.m.wikipedia.org/wiki/Madrid?action=edit","talk":"https://es.m.wikipedia.org/wiki/Discusi%C3%B3n:Madrid"}},"extract":"Madrid es un municipio y una ciudad de España. La localidad, con categoría histórica de villa, es la capital del Estado y de la Comunidad de Madrid. En su término municipal, el más poblado de España, están empadronadas 3 280 782 personas, constituyéndose como la segunda ciudad más poblada de la Unión Europea, así como su área metropolitana, con 6 779 888 habitantes empadronados.","extract_html":"<p><b>Madrid</b> es un municipio y una ciudad de España. La localidad, con categoría histórica de villa, es la capital del Estado y de la Comunidad de Madrid. En su término municipal, el más poblado de España, están empadronadas <span>3 280 782 personas</span>, constituyéndose como la segunda ciudad más poblada de la Unión Europea, así como su área metropolitana, con <span>6 779 888 habitantes</span> empadronados.</p>"} real 0m0.047s user 0m0.025s sys 0m0.017s
In yesterday's meeting IIRC @klausman mentioned testing direct API calls to confirm whether the Rest Gateway endpoint used by LiftWing is slower. Here are some API calls we could use to test this:
1.Run within the article-descriptions model server hosted on LiftWing in the experimental namespace.
time curl http://rest-gateway.discovery.wmnet:4111/es.wikipedia.org/v1/page/summary/Madrid
2.Run within the article-descriptions model server hosted outside LiftWing.
time curl https://es.wikipedia.org/api/rest_v1/page/summary/Madrid
Wed, Feb 21
To further understand the latency on LiftWing, I looked at backends and p0.99 in this grafana dashboard: https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?from=now-3h&orgId=1&to=now&var-backend=rest-gateway.discovery.wmnet&var-cluster=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=experimental&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&var-response_code=All
After having a chat with @isarantopoulos on IRC, I ran a comparison between the preprocess() and predict() methods. I used the sample inputs in P54507 to determine if there was a discrepancy between LiftWing and a LocalServer running on the ml-sandbox.
According to T343123#9520331, the bottleneck is the preprocess step. However, the formatted JSON response shown above and the code profiling we did in T353127#9399942 indicate that the bottleneck is the predict step.
The preprocess() method calculates its runtime using execution_times["total network (s)"] as shown here. Based on the formatted JSON response shown above, the preprocess runtime is about 0.4s. These results match @Isaac's comment in T343123#9527462.
Tue, Feb 20
In today's meeting, the team discussed T357913#9558911 and suggested renaming the readability model server parent directory to readability_model as this is the same pattern used for revscoring_model.
@isarantopoulos and I had a chat on IRC and agreed that refactoring the readability model server to run as a Python module would be beneficial. This would help standardize integration with other tools (such as Makefile) and improve maintainability. As I began looking into the refactoring process, I noticed that the readability model server already uses a readability module (here and here). This presents a conflict because when we refactor the readability model server to work as a Python module, Python cannot find classify() and load_model() methods in the new local readability module as they are in the installed readability module. To resolve this conflict, we may need to rename either the readability model server or the readability module that we import into the server.
Mon, Feb 19
Feb 16 2024
+1 on adding a note to the model card. Support for building the langid model-server using the Makefile was added and it can be tested using:
# first terminal $ make language-identification # second terminal $ curl localhost:8080/v1/models/langid:predict -i -X POST -d '{"text": "Some random text in any language"}' $ MODEL_TYPE=langid make clean
While testing the locally-built langid model-server, I queried the inference service and received some interesting results. I tested three languages (English, French, and Swahili) and found that the isvc struggled to predict English accurately when the input was a short sentence with only about four words. Here are the results of my tests:
Hi @hashar, in this blubber file, we have been trying to copy files from the test to the codehealth variant but keep running into the error below:
#6 local://context #6 sha256:177a22c08429ba9a74acccc3451f709da925e7f932e391246aa7c960ee0b7b7d #6 DONE 0.0s failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to compile to LLB state: preparation: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Here is the full build log: https://integration.wikimedia.org/ci/job/research-mwaddlink-pipeline-test/1023/execution/node/86/log/
Feb 15 2024
Hi @Urbanecm_WMF, both you and I share the same concerns regarding model training matching model serving requirements as I wrote in T357217#9534633.
Feb 14 2024
Finally got the test build to succeed. @kostajh and @Urbanecm_WMF please review whenever you get a minute: https://gerrit.wikimedia.org/r/1001958. Thanks!
Following T357217#9534633, I had a chat with @MGerlach and option 1 is the most feasible at the moment.
Feb 13 2024
The error above has been fixed by installing the wheel package before installing fasttext. The langid requirements.txt that I used has:
kserve==0.11.2 wheel==0.42.0 fasttext==0.9.2
Trying to build the langid model-server locally throws the error below. This seems to be caused when pip is installing fasttext==0.9.2 and can't find the pybind11 package.
Collecting fasttext==0.9.2 (from -r langid/././requirements.txt (line 2)) Downloading fasttext-0.9.2.tar.gz (68 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 3.3 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error
Feb 12 2024
I checked stat1008 and python3.9 is not available as shown below:
kevinbazira@stat1008:~/fix-add-a-link-CI-deps/mwaddlink$ virtualenv -p python3.9 venv The path python3.9 (from --python=python3.9) does not exist
The model training pipeline is run on stat1008, without python3.9, this process will be blocked.
Feb 9 2024
Configurations were added to the Makefile and now it maintains the models directory structure for model-server make builds to remain consistent with the analytics repo. Below are examples on how we tested this:
1.RRLA
# first terminal $ make revertrisk-language-agnostic # second terminal $ tree /models/ /models/ └── revertrisk └── language-agnostic └── 20221026144108 └── model.pkl $ MODEL_TYPE=revertrisk make clean
Feb 8 2024
Support for building the article-descriptions model-server using the Makefile was added and it can be tested using:
# first terminal $ make article-descriptions # second terminal $ curl localhost:8080/v1/models/article-descriptions:predict -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 3}' -H "Content-Type: application/json" --http1.1 $ MODEL_SERVER_PARENT_DIR=article_descriptions make clean
Running the build commands today succeeded for 1/2 as shown below. The previous model download issue is likely to have been caused by a cap from the analytics public repository website as discussed in the meeting.
Feb 6 2024
We did not face the issue of sentencepiece failing to install on linux. This is probably because cmake and pkg-config packages are typically available in most linux systems. The link below shows a macOS-specific solution that resolves this issue:
https://github.com/google/sentencepiece/issues/378#issuecomment-969896519
Feb 5 2024
I have not encountered this on Linux with Python 3.9.2. Are you using Python 3.11? KServe uses ray package <2.5.0 and >=2.4.0. Based on https://pypi.org/project/ray/2.4.0/, this version of the ray package supports Python 3.6 to 3.10.
Jan 30 2024
Jan 29 2024
@Seddon, in T353127 we were able to make significant improvements in response latency. For example, in T353127#9398823, there was a request that initially had a 14s response time. With subsequent optimization efforts, we managed to reduce this to 4s as seen in T353127#9421055. This reduction was achieved by: exceeding the CloudVPS instance CPU and memory resources; and using CPU core pinning. Both of these methods did not affect the prediction quality.
Jan 25 2024
I pinged Muniza about the possibility of loosening the knowledge-integrity constraint to allow for pydantic < 2.0.0 and here is her response:
Thank you for the suggestion @isarantopoulos, I tried fastapi==0.109.0 and run into the error below. It looks like kserve 0.11.2 doesn't support it.
ERROR: Cannot install -r revert_risk_model/model_server/revertrisk/requirements.txt (line 3) and fastapi==0.109.0 because these package versions have conflicting dependencies.
Jan 24 2024
I have been working on updating knowledge-integrity in the RRLA model-server. Tried running it locally and I am currently getting dependency conflicts between kserve's fastapi pydantic and knowledge-integrity's pydantic as shown below:
ERROR: Cannot install -r revert_risk_model/model_server/revertrisk/requirements.txt (line 1), knowledge-integrity[revertrisk]==0.6.0 and kserve because these package versions have conflicting dependencies.
Jan 22 2024
Until now, this locust prototype has been using the same payload to run a load test on the article-descriptions model-server. Since in wrk we were using multiple payloads by reading an input file, I have updated article_descriptions.py to replicate this functionality using process_payload().
Jan 19 2024
In order to compare historical data from T351939#9469592, I updated article_descriptions.py with lw_stats_analysis() and changed lw_stats_history() to use pandas instead of the csv module. Below is what the comparison report looks like for a given lw_stats_history.csv.
Jan 18 2024
locust has a test_stop event listener that can be used to read article_descriptions_stats.csv after it has been generated at the end of a load test run. I have updated the article_descriptions.py file to utilize this event hook to extract the "Aggregated" data from article_descriptions_stats.csv (as shown in T351939#9468383) and save it to lw_stats_history.csv. Here is what the contents of lw_stats_history.csv look like after running 3 load tests:
Timestamp | Request Count | Failure Count | Median Response Time | Average Response Time | Min Response Time | Max Response Time | Average Content Size | Requests/s | Failures/s | 50% | 66% | 75% | 80% | 90% | 95% | 98% | 99% | 99.9% | 99.99% | 100% |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20240118151948 | 2 | 0 | 3528 | 3618.5 | 3528 | 3709 | 167.0 | 0.22888733537397923 | 0.0 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 |
20240118152039 | 2 | 0 | 3500 | 3567.0 | 3490 | 3644 | 167.0 | 0.2291656581522369 | 0.0 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 | 3600 |
20240118152356 | 2 | 0 | 3439 | 3772.0 | 3439 | 4105 | 167.0 | 0.2307884342326772 | 0.0 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 | 4100 |
The next step will be working on running a comparative analysis on this data.