Page MenuHomePhabricator

Enable local runs for article-descriptions model
Closed, ResolvedPublic2 Estimated Story Points

Description

I want to refactor the article-descriptions model server so that I can run the server locally without using docker containers.

Event Timeline

Change 976670 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] article-descriptions: enable local run

https://gerrit.wikimedia.org/r/976670

Current status is that I'm able to start the model server but I'm getting an error when making a request.

2023-11-24 21:04:56.674 uvicorn.error ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/timing_asgi/middleware.py", line 68, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/protocol/rest/v1_endpoints.py", line 76, in predict
    response, response_headers = await self.dataplane.infer(model_name=model_name,
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/protocol/dataplane.py", line 311, in infer
    response = await model(request, headers=headers)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/kserve/model.py", line 108, in __call__
    payload = await self.preprocess(body, headers) if inspect.iscoroutinefunction(self.preprocess) \
  File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 47, in preprocess
    descriptions, sitelinks, blp = await self.get_wikidata_info(lang, title)
  File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 193, in get_wikidata_info
    session = mwapi.AsyncSession(
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/mwapi/async_session.py", line 53, in __init__
    setattr(self.session, key, value)
AttributeError: can't set attribute

Change 977241 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-descriptions: fix AsyncSession host header

https://gerrit.wikimedia.org/r/977241

I discovered what was causing this issue and pushed a patch for it here.

Essentially the first code snippet below is throwing this error AttributeError: can't set attribute but the second code snippet works well.

session = mwapi.AsyncSession(
    host=self.wiki_url,
    user_agent=self.user_agent,
    session=self.get_http_client_session("mwapi"),
    headers={"Host": "www.wikidata.org"},
)
session = mwapi.AsyncSession(
    host=self.wiki_url,
    user_agent=self.user_agent,
    session=self.get_http_client_session("mwapi"),
)
session.headers["Host"] = "www.wikidata.org"

Change 977241 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-descriptions: fix AsyncSession host header

https://gerrit.wikimedia.org/r/977241

Currently investigating the following error I get on model load when setting low_cpu_usage=True :

 File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 301, in <module>
    model = ArticleDescriptionsModel(model_name)
  File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 30, in __init__
    self.load()
  File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/model.py", line 33, in load
    self.model.load_model(self.model_path)
  File "/Users/isaranto/repoz/inference-services/article_descriptions/model_server/utils.py", line 101, in load_model
    model = model.to(device)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2271, in to
    return super().to(*args, **kwargs)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/Users/isaranto/repoz/inference-services/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Moreover model transfer to device (GPU) should take place after the web server's initialization as described and done in the [[ https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/llm/model-server/model.py#56 | check_gpu function in llm models ]].

calbon triaged this task as Medium priority.Nov 28 2023, 3:51 PM
calbon moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Change 978535 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[integration/config@master] ml-services: renamed directory for article_descriptions

https://gerrit.wikimedia.org/r/978535

Change 978535 merged by jenkins-bot:

[integration/config@master] ml-services: renamed directory for article_descriptions

https://gerrit.wikimedia.org/r/978535

Change 976670 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-descriptions: enable local run

https://gerrit.wikimedia.org/r/976670

Change 979111 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] article-descriptions: fix boolean parsing of env var

https://gerrit.wikimedia.org/r/979111

Change 979114 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: update article-desc image

https://gerrit.wikimedia.org/r/979114

Change 979111 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-descriptions: fix boolean parsing of env var

https://gerrit.wikimedia.org/r/979111

Change 979114 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-desc image

https://gerrit.wikimedia.org/r/979114

After deploying the changes in this task I'm getting a 500 with the following error logs

Traceback (most recent call last):
  File "/opt/lib/python/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/opt/lib/python/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/opt/lib/python/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/lib/python/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/lib/python/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/lib/python/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/lib/python/site-packages/timing_asgi/middleware.py", line 70, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/opt/lib/python/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/lib/python/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/lib/python/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/lib/python/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/lib/python/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/lib/python/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/lib/python/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/opt/lib/python/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/opt/lib/python/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/lib/python/site-packages/kserve/protocol/rest/v1_endpoints.py", line 76, in predict
    response, response_headers = await self.dataplane.infer(model_name=model_name,
  File "/opt/lib/python/site-packages/kserve/protocol/dataplane.py", line 311, in infer
    response = await model(request, headers=headers)
  File "/opt/lib/python/site-packages/kserve/model.py", line 122, in __call__
    else self.predict(payload, headers)
  File "/srv/article_descriptions/model_server/model.py", line 92, in predict
    prediction = self.model.predict(
  File "/srv/article_descriptions/model_server/utils.py", line 142, in predict
    tokens = self.model.generate(
  File "/opt/lib/python/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/srv/article_descriptions/model_server/descartes/src/models/descartes_mbart.py", line 570, in generate
    batch_size = inputs_tensor[list(inputs_tensor.keys())[0]].shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'
INFO:root:Opening a new Asyncio session for restgateway.
ERROR:root:Failed to retrieve first paragraph: 404, message='Not Found', url=URL('http://api-ro.discovery.wmnet/v1/page/summary/Clandonald')
INFO:root:Opening a new Asyncio session for restgateway.
ERROR:root:Failed to retrieve first paragraph: 404, message='Not Found', url=URL('http://api-ro.discovery.wmnet/v1/page/summary/Clandonald')

Reverted the deployment in order to investigate and fix.

The above issue was caused by the folowing line of code

base_url = urljoin(self.rest_gateway_endpoint, self.wiki_url or mw_host)

it tries to create a url from rest_gateway_endpoint and wiki_url, so http://rest-gateway.discovery.wmnet:411 and http://api-ro.discovery.wmnet respectively. The result of urljoin ends up being just the latter cause it expects a base url and a secondary url to perform a jon More info in the docs

However in this line of code we should be merging with mw_host instead and in this case we should prepend the scheme (https://) when we are not using the rest_gateway_endpoint.
I'll be adding a function and a couple of unit tests to the code to make sure it behaves properly for all scenarios.

Change 979369 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] article-descriptions: add helper function for rest gateway url

https://gerrit.wikimedia.org/r/979369

Change 979369 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-descriptions: add helper function for rest gateway url

https://gerrit.wikimedia.org/r/979369

Change 980002 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: fix rest gateway endpoint creation in article descriptions

https://gerrit.wikimedia.org/r/980002

Change 980002 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: fix rest gateway endpoint creation in article descriptions

https://gerrit.wikimedia.org/r/980002

The model can now be ran locally following the instructions in the README.md file.
I added a couple of unit tests that assess that the correct url is created for the REST API requests.
I ran into some issues with virtualenv and tox while trying to add the unit tests to run in the test image so I left it for now.
I suggest we revisit how our whole test images run when we move the repo to Gitlab.

isarantopoulos set Final Story Points to 5.