I want to refactor the article-descriptions model server so that I can run the server locally without using docker containers.
Description
Details
Related Objects
- Mentioned In
- rMLIS4ed330dea651: article-descriptions: add helper function for rest gateway url
rMLIS65538428100b: article-descriptions: fix boolean parsing of env var
rMLIS88d0063c6528: article-descriptions: enable local run
T348156: Goal: Increase the number of models hosted on Lift Wing
rMLISc46ed2a8e026: article-descriptions: fix AsyncSession host header
Event Timeline
Change 976670 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[machinelearning/liftwing/inference-services@main] article-descriptions: enable local run
Current status is that I'm able to start the model server but I'm getting an error when making a request.
2023-11-24 21:04:56.674 uvicorn.error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ return await self.app(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__ await super().__call__(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/timing_asgi/middleware.py", line 68, in __call__ await self.app(scope, receive, send_wrapper) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__ raise exc File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ await self.app(scope, receive, sender) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ raise e File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__ await route.handle(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/protocol/rest/v1_endpoints.py", line 76, in predict response, response_headers = await self.dataplane.infer(model_name=model_name, File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/protocol/dataplane.py", line 311, in infer response = await model(request, headers=headers) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/model.py", line 108, in __call__ payload = await self.preprocess(body, headers) if inspect.iscoroutinefunction(self.preprocess) \ File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 47, in preprocess descriptions, sitelinks, blp = await self.get_wikidata_info(lang, title) File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 193, in get_wikidata_info session = mwapi.AsyncSession( File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/mwapi/async_session.py", line 53, in __init__ setattr(self.session, key, value) AttributeError: can't set attribute
Change 977241 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):
[machinelearning/liftwing/inference-services@main] article-descriptions: fix AsyncSession host header
I discovered what was causing this issue and pushed a patch for it here.
Essentially the first code snippet below is throwing this error AttributeError: can't set attribute but the second code snippet works well.
session = mwapi.AsyncSession( host=self.wiki_url, user_agent=self.user_agent, session=self.get_http_client_session("mwapi"), headers={"Host": "www.wikidata.org"}, )
session = mwapi.AsyncSession( host=self.wiki_url, user_agent=self.user_agent, session=self.get_http_client_session("mwapi"), ) session.headers["Host"] = "www.wikidata.org"
Change 977241 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] article-descriptions: fix AsyncSession host header
Currently investigating the following error I get on model load when setting low_cpu_usage=True :
File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 301, in <module> model = ArticleDescriptionsModel(model_name) File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 30, in __init__ self.load() File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 33, in load self.model.load_model(self.model_path) File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/utils.py", line 101, in load_model model = model.to(device) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2271, in to return super().to(*args, **kwargs) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!
Moreover model transfer to device (GPU) should take place after the web server's initialization as described and done in the [[ https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/llm/model-server/model.py#56 | check_gpu function in llm models ]].
Change 978535 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[integration/config@master] ml-services: renamed directory for article_descriptions
Change 978535 merged by jenkins-bot:
[integration/config@master] ml-services: renamed directory for article_descriptions
Mentioned in SAL (#wikimedia-releng) [2023-11-29T17:43:05Z] <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/978535 # T351940
Change 976670 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] article-descriptions: enable local run
Change 979111 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[machinelearning/liftwing/inference-services@main] article-descriptions: fix boolean parsing of env var
Change 979114 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[operations/deployment-charts@master] ml-services: update article-desc image
Change 979111 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] article-descriptions: fix boolean parsing of env var
Change 979114 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update article-desc image
After deploying the changes in this task I'm getting a 500 with the following error logs
Traceback (most recent call last): File "/opt/lib/python/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/lib/python/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ return await self.app(scope, receive, send) File "/opt/lib/python/site-packages/fastapi/applications.py", line 276, in __call__ await super().__call__(scope, receive, send) File "/opt/lib/python/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/lib/python/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/opt/lib/python/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/opt/lib/python/site-packages/timing_asgi/middleware.py", line 70, in __call__ await self.app(scope, receive, send_wrapper) File "/opt/lib/python/site-packages/starlette/middleware/exceptions.py", line 79, in __call__ raise exc File "/opt/lib/python/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ await self.app(scope, receive, sender) File "/opt/lib/python/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ raise e File "/opt/lib/python/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "/opt/lib/python/site-packages/starlette/routing.py", line 718, in __call__ await route.handle(scope, receive, send) File "/opt/lib/python/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/opt/lib/python/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/opt/lib/python/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/opt/lib/python/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/opt/lib/python/site-packages/kserve/protocol/rest/v1_endpoints.py", line 76, in predict response, response_headers = await self.dataplane.infer(model_name=model_name, File "/opt/lib/python/site-packages/kserve/protocol/dataplane.py", line 311, in infer response = await model(request, headers=headers) File "/opt/lib/python/site-packages/kserve/model.py", line 122, in __call__ else self.predict(payload, headers) File "/srv/article_descriptions/model_server/model.py", line 92, in predict prediction = self.model.predict( File "/srv/article_descriptions/model_server/utils.py", line 142, in predict tokens = self.model.generate( File "/opt/lib/python/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/srv/article_descriptions/model_server/descartes/src/models/descartes_mbart.py", line 570, in generate batch_size = inputs_tensor[list(inputs_tensor.keys())[0]].shape[0] AttributeError: 'NoneType' object has no attribute 'shape' INFO:root:Opening a new Asyncio session for restgateway. ERROR:root:Failed to retrieve first paragraph: 404, message='Not Found', url=URL('http://api-ro.discovery.wmnet/v1/page/summary/Clandonald') INFO:root:Opening a new Asyncio session for restgateway. ERROR:root:Failed to retrieve first paragraph: 404, message='Not Found', url=URL('http://api-ro.discovery.wmnet/v1/page/summary/Clandonald')
Reverted the deployment in order to investigate and fix.
The above issue was caused by the folowing line of code
base_url = urljoin(self.rest_gateway_endpoint, self.wiki_url or mw_host)
it tries to create a url from rest_gateway_endpoint and wiki_url, so http://rest-gateway.discovery.wmnet:411 and http://api-ro.discovery.wmnet respectively. The result of urljoin ends up being just the latter cause it expects a base url and a secondary url to perform a jon More info in the docs
However in this line of code we should be merging with mw_host instead and in this case we should prepend the scheme (https://) when we are not using the rest_gateway_endpoint.
I'll be adding a function and a couple of unit tests to the code to make sure it behaves properly for all scenarios.
Change 979369 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[machinelearning/liftwing/inference-services@main] article-descriptions: add helper function for rest gateway url
Change 979369 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] article-descriptions: add helper function for rest gateway url
Change 980002 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[operations/deployment-charts@master] ml-services: fix rest gateway endpoint creation in article descriptions
Change 980002 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: fix rest gateway endpoint creation in article descriptions
The model can now be ran locally following the instructions in the README.md file.
I added a couple of unit tests that assess that the correct url is created for the REST API requests.
I ran into some issues with virtualenv and tox while trying to add the unit tests to run in the test image so I left it for now.
I suggest we revisit how our whole test images run when we move the repo to Gitlab.