Page MenuHomePhabricator

Host the recommendation-api container on LiftWing
Closed, ResolvedPublic

Description

In T338805, we containerized the content translation recommendation-api, in this task we shall host this container as a second step in the process of migrating it to Lift Wing.

Related Objects

Event Timeline

Change 932810 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/recommendation-api@master] Set up production and test images for the recommendation-api migration

https://gerrit.wikimedia.org/r/932810

Change 935880 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[integration/config@master] recommendation-api: add pipelines for recommendation-api-ng

https://gerrit.wikimedia.org/r/935880

Change 935881 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[integration/config@master] recommendation-api: add pipelines for recommendation-api-ng

https://gerrit.wikimedia.org/r/935881

Change 935881 abandoned by Kevin Bazira:

[integration/config@master] recommendation-api: add pipelines for recommendation-api-ng

Reason:

Fixed this in: https://gerrit.wikimedia.org/r/c/integration/config/+/935880/

https://gerrit.wikimedia.org/r/935881

Change 935880 merged by jenkins-bot:

[integration/config@master] recommendation-api: add pipelines for recommendation-api-ng

https://gerrit.wikimedia.org/r/935880

Mentioned in SAL (#wikimedia-releng) [2023-07-10T08:02:26Z] <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/935880 "recommendation-api: add pipelines for recommendation-api-ng" T339890

The NodeJS recommendation-api already has entries to the CI pipeline (in jjb/project-pipelines.yaml and zuul/layout.yaml) that publish an image to the Wikimedia docker registry. To avoid naming conflicts, the CI pipelines for the Python recommendation-api that we are migrating have been suffixed with "-ng" as suggested in T293648#8981227.

Change 932810 merged by jenkins-bot:

[research/recommendation-api@master] Set up production and test images for the recommendation-api migration

https://gerrit.wikimedia.org/r/932810

After encountering issues with CI fetching files from the Wikimedia public datasets archive (T341582) and CI post-merge build failure due to an internal server error (T342084). Finally the recommendation-api image has been published to the Docker registry: https://docker-registry.wikimedia.org/wikimedia/research-recommendation-api/tags/. The next step is to deploy it onto LiftWing/k8s.

Change 948689 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] [WIP] Add Helm chart for the recommendation-api-ng

https://gerrit.wikimedia.org/r/948689

Change 948689 abandoned by Kevin Bazira:

[operations/deployment-charts@master] [WIP] Add Helm chart for the recommendation-api-ng

Reason:

The team decided to move on with a general chart for python web apps in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/955584

https://gerrit.wikimedia.org/r/948689

Change 955018 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: deployment settings for the recommendation-api-ng

https://gerrit.wikimedia.org/r/955018

Change 955018 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deployment settings for the recommendation-api-ng

https://gerrit.wikimedia.org/r/955018

Deployment settings for the recommendation-api-ng have been merged but when we try to deploy on staging we get:

kevinbazira@deploy1002:/srv/deployment-charts/helmfile.d/ml-services/recommendation-api-ng$ helmfile -e ml-staging-codfw sync
Affected releases are:
  main (wmf-stable/python-webapp) UPDATED

skipping missing values file matching "values-main.yaml"
helmfile.yaml: basePath=.
Upgrading release=main, chart=wmf-stable/python-webapp
Release "main" does not exist. Installing it now.


FAILED RELEASES:
NAME
main
in ./helmfile.yaml: failed processing release main: command "/usr/bin/helm3" exited with non-zero status:

PATH:
  /usr/bin/helm3

ARGS:
  0: helm3 (5 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: --reset-values (14 bytes)
  4: main (4 bytes)
  5: wmf-stable/python-webapp (24 bytes)
  6: --timeout (9 bytes)
  7: 600s (4 bytes)
  8: --atomic (8 bytes)
  9: --namespace (11 bytes)
  10: recommendation-api-ng (21 bytes)
  11: --values (8 bytes)
  12: /tmp/values022274232 (20 bytes)
  13: --values (8 bytes)
  14: /tmp/values620292791 (20 bytes)
  15: --values (8 bytes)
  16: /tmp/values748254890 (20 bytes)
  17: --values (8 bytes)
  18: /tmp/values382482945 (20 bytes)
  19: --history-max (13 bytes)
  20: 10 (2 bytes)
  21: --kubeconfig=/etc/kubernetes/recommendation-api-ng-deploy-ml-staging-codfw.config (81 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/recommendation-api-ng-deploy-ml-staging-codfw.config
  Error: release main failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

COMBINED OUTPUT:
  WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/recommendation-api-ng-deploy-ml-staging-codfw.config
  Release "main" does not exist. Installing it now.
  Error: release main failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
kevinbazira@deploy1002:/srv/deployment-charts/helmfile.d/ml-services/recommendation-api-ng$
2023-09-08 13:37:33,415 recommendation.api.types.related_articles.candidate_finder fetch_embedding():157 ERROR -- Failed to get object from Swift
Traceback (most recent call last):
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 155, in fetch_embedding
    swift_object = conn.get_object(swift_container, swift_object_path)
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1831, in get_object
    rheaders, body = self._retry(None, get_object, container, obj,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1710, in _retry
    self.url, self.token = self.get_auth()
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1654, in get_auth
    self.url, self.token = get_auth(self.authurl, self.user, self.key,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 674, in get_auth
    storage_url, token = get_auth_1_0(auth_url,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 524, in get_auth_1_0
    raise ClientException.from_response(resp, 'Auth GET failed', body)
swiftclient.exceptions.ClientException: Auth GET failed: http://localhost:6022/auth/v1.0 404 Not Found
Traceback (most recent call last):
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 155, in fetch_embedding
    swift_object = conn.get_object(swift_container, swift_object_path)
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1831, in get_object
    rheaders, body = self._retry(None, get_object, container, obj,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1710, in _retry
    self.url, self.token = self.get_auth()
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 1654, in get_auth
    self.url, self.token = get_auth(self.authurl, self.user, self.key,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 674, in get_auth
    storage_url, token = get_auth_1_0(auth_url,
  File "/opt/lib/python/site-packages/swiftclient/client.py", line 524, in get_auth_1_0
    raise ClientException.from_response(resp, 'Auth GET failed', body)
swiftclient.exceptions.ClientException: Auth GET failed: http://localhost:6022/auth/v1.0 404 Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "recommendation/data/recommendation.wsgi", line 26, in <module>
    candidate_finder.initialize_embedding()
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 83, in initialize_embedding
    _embedding.initialize(embedding_client, embedding_path, embedding_package, embedding_name, optimize, optimized_embedding_path)
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 111, in initialize
    self.load_raw_embedding(client, path, package, name)
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 168, in load_raw_embedding
    f = self.fetch_embedding(client, path, package, name)
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 158, in fetch_embedding
    raise RuntimeError(f'Failed to get object from Swift: {e}')
RuntimeError: Failed to get object from Swift: Auth GET failed: http://localhost:6022/auth/v1.0 404 Not Found

It seems definitely an issue with the config of the envoy proxy:

elukey@ml-staging2002:~$ curl https://thanos-swift.discovery.wmnet:443/info
{"swift": {"version": "2.26.0", "strict_cors_mode": true, "policies":[...CUT...]


elukey@ml-staging2002:~$ sudo nsenter -t 1682700 -n curl localhost:6022/info -H "thanos-swift.discovery.wmnet" -i
HTTP/1.1 404 Not Found
date: Mon, 11 Sep 2023 07:41:44 GMT
server: envoy
content-length: 0
x-envoy-upstream-service-time: 2

On the Thanos FE nodes:

elukey@thanos-fe1001:~$ curl https://localhost:443/info -k -H "Host: thanos-swift.discovery.wmnet"
{"swift": {"version": "2.26.0", "strict_cors_mode": true, ....

elukey@thanos-fe1001:~$ curl https://localhost:443/info -k -i
HTTP/1.1 404 Not Found
date: Mon, 11 Sep 2023 08:48:41 GMT
server: envoy
content-length: 0

The current flow of data is:

swift-client --> local-envoy-proxy (lift wing) -------TLS-------> envoy (thanos) ---> etc..

My theory is that the local-envoy-proxy should set the Host header, but it doesn't and it gets a 404 from the envoy running on Thanos' FE nodes.

Change 956373 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::service_proxy::envoy: set use_ingress for thanos-swift

https://gerrit.wikimedia.org/r/956373

Change 956373 merged by Elukey:

[operations/puppet@production] profile::service_proxy::envoy: set use_ingress for thanos-swift

https://gerrit.wikimedia.org/r/956373

Change 956377 had a related patch set uploaded (by Elukey; author: Elukey):

[research/recommendation-api@master] blubber: add wmf-certificates to the production image

https://gerrit.wikimedia.org/r/956377

Change 956377 merged by jenkins-bot:

[research/recommendation-api@master] blubber: add wmf-certificates to the production image

https://gerrit.wikimedia.org/r/956377

Change 956379 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::service_proxy::envoy: rename uses_ingress to sets_sni

https://gerrit.wikimedia.org/r/956379

Change 956380 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update Docker image for recommendation-api-ng

https://gerrit.wikimedia.org/r/956380

Change 956380 merged by Elukey:

[operations/deployment-charts@master] ml-services: update Docker image for recommendation-api-ng

https://gerrit.wikimedia.org/r/956380

Change 956385 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: add REQUESTS_CA_BUNDLE env var to rec-api-ng's config

https://gerrit.wikimedia.org/r/956385

Change 956385 merged by Elukey:

[operations/deployment-charts@master] ml-services: add REQUESTS_CA_BUNDLE env var to rec-api-ng's config

https://gerrit.wikimedia.org/r/956385

@elukey, regarding the rec-api memory usage, please see the findings below got from docker stats after monitoring the rec-api container running locally:

  1. When processing the embedding the memory usage spiked upwards of ~7GB.
  2. After the embedding had been loaded, the memory dropped and stayed within ~5GB even when I queried the api using: http://127.0.0.1/api/?s=en&t=fr&n=3&article=Apple

NB: I run the rec-api container with 10GB memory using:

$ docker run -it --memory=10g --entrypoint=/bin/bash recommendation-api

Here is a screenshot, captured from the steady-state:

rec-api steady-state docker stats - Screenshot from 2023-09-11 15-02-08.png (170×1 px, 26 KB)

@elukey, on IRC you mentioned:

one quick thing - I am reading https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/api/types/related_articles/candidate_finder.py#L167
and it is probably something that we can improve
for example, we could do the prep work offline and force np to load from file
I never done it but I am pretty sure it should be doable
could you please check if this is doable? And also update the task with all the findings etc.

I am not sure what kind of "prep work" you are referring to. Could you please, clarify? Thanks!

Change 956017 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: increase the recommendation-api-ng memory usage

https://gerrit.wikimedia.org/r/956017

@elukey, on IRC you mentioned:

one quick thing - I am reading https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/api/types/related_articles/candidate_finder.py#L167
and it is probably something that we can improve
for example, we could do the prep work offline and force np to load from file
I never done it but I am pretty sure it should be doable
could you please check if this is doable? And also update the task with all the findings etc.

I am not sure what kind of "prep work" you are referring to. Could you please, clarify? Thanks!

My point was that it seems, from a quick glance at the code, that we do a lot of data processing when bootstrapping the python app (via uwsgi) that maybe could be done offline. For example, what if we could:

  1. Take the embeddings binary (actually stored in Swift), do some processing to the data offline (and once time)
  2. Re-upload the binary to Swift.
  3. Change the Python code to just load data from the new file (without loading data in memory to preprocess etc..).

I don't think that we should deploy an app requiring 10GBs of memory for each pod, it is a 10x increase compared to most our deployments and it could shorten our k8s capacity pretty quickly if more than one pods are needed in production.

Change 956441 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] modules: add configuration 1.5.0 to mesh

https://gerrit.wikimedia.org/r/956441

The main issue I see (and the one elukey highlights) is that as a service it is not scalable. I suggest that we try to do any optimization in memory usage that we can, while maintaining the current architecture of the app(load embeddings in memory from file etc).

If we wanted to further optimize this and use a database to store the embeddings, the bottleneck would be the function def most_similar(self, word) as it required to have an array in memory to perform a dot product.
So I'd try the below things in order:

  • reduce memory footprint by downcasting to 32bit or 16bit floats.
  • load a memmap'ed version of the array and see how much time a dot product would take given our array.
  • Last thing requires too much work and would be to use a vector db that has built in similarity search capability (e.g. elasticsearch). I'm not really suggesting this option at the moment, just jotting it down

@calbon, suggested that we use dtype=np.float32 to reduce memory usage. I have tested it and below are the results:

dtypeon-loadsteady-state
float~7GB~5GB
np.float32~3.8GB~2.8GB

As suggested in T339890#9156420, I run the load_raw_embedding method and saved both wikidata_ids and decoded_lines numpy arrays:

...
np.save('wikidata_ids.npy', self.wikidata_ids)
np.save('decoded_lines.npy', self.embedding)
.
.
.
root@a0efd763f796:/home/recommendation-api# ls -lh
-rw-r--r-- 1 root root 2.4G Sep 12 06:53 decoded_lines.npy
-rw-r--r-- 1 root root 107M Sep 12 06:53 wikidata_ids.npy

Then loaded the 2 preprocessed files in a different container and got the following memory usage results:

preprocessingon-loadsteady-state
before~7GB~5GB
after~5.1GB~2.8GB

Change 956846 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/recommendation-api@master] load_raw_embedding: Downcast float to np.float32

https://gerrit.wikimedia.org/r/956846

A 4th memory usage test was run using a combination of the 2nd (float downcasted to np.float32) and 3rd (preprocessed numpy arrays). Below are the steps taken and the results:

1.Downcasted decoded lines to np.float32 then saved numpy arrays

...
decoded_lines.append(np.fromstring(values, dtype=np.float32, sep=' ', count=columns))
...
np.save('wikidata_ids.npy', self.wikidata_ids)
np.save('decoded_lines_float32.npy', self.embedding)
.
.
.
root@b9b4eff100a5:/home/recommendation-api# ls -lh
-rw-r--r-- 1 root root 1.2G Sep 13 05:04 decoded_lines_float32.npy
-rw-r--r-- 1 root root 107M Sep 13 05:04 wikidata_ids.npy

2.Loaded the preprocessed files in a different container and got the following memory usage results:

preprocessingdtypeon-loadsteady-state
beforefloat~7GB~5GB
afternp.float32~1.5GB~1.36GB

Nice work @kevinbazira! This is much better and we should proceed with this (I mean the combination of 32bit floats and preprocessed numpy arrays). I think we should leave further downcasting to 16 bit floats for the moment as that would require further testing of the quality of the embeddings.

Hey folks -- not to take away from the good work by @kevinbazira but I wanted to flag that I don't think it makes sense to port the embeddings component over to LiftWing as part of this work. The existing Content Translation API doesn't use the embeddings (nor does any other service) and they're quite old with no plan as far as I know to update them and maintain them. If they are incorporated into the endpoint, I presume that the quality of predictions will dramatically decrease as they'll be based on Wikidata items from 5+ years ago. The important logic is the code that hits the Search APIs and filters appropriately based on what articles exist in the target language (see T308165#7983559).

Some further context:

  • My digging into which services were being used by Content Translation which also notes the embeddings aren't on the recommendation API instance being used by Content Translation: T308165#7983559
  • Further stats showing the embedding-backed endpoint isn't being used by any services: T340854#8998466

Ccing @santhosh as well for visibility / input as needed.

Thank you for sharing this information, @Isaac.

When we were containerizing the Flask app that runs this recommendation-api, we ran into errors where the application expected to load embeddings as shown in T338805#8926847 and T338805#8931200.

Is there a way to run this Flask app without throwing those errors (without rewriting the whole app:)?

If so, that would be great, as it would enable us to migrate the recommendation-api from wmflabs to LiftWing without the embeddings.

When we were containerizing the Flask app that runs this recommendation-api, we ran into errors where the application expected to load embeddings as shown in T338805#8926847 and T338805#8931200.

@kevinbazira thanks for this additional context -- I wasn't aware of the errors / discussion around getting it running before attempting to make changes and that helps with understanding what's useful.

Is there a way to run this Flask app without throwing those errors (without rewriting the whole app:)?

Yep! In the recommendation.ini file, there's an enabled services section and you can switch related_articles to False (and gapfinder too if you'd like). Doing that locally for me got it running (though I was on a Python3.10 environment so I also had to replace a bunch of flask_restplus imports with flask_restx FYI). And to my point about the embeddings not being used, turning off that endpoint won't cause any issues for end-users and in fact I verified that it's set to False on the Cloud VPS instance running it (tool.recommendation-api.eqiad1.wikimedia.cloud:/etc/recommendation/recommendation.ini).

@Isaac, thank you for the pointer. I have tested the recommendation-api with related_articles switched off and it ran without the errors.

Below are the steps I took and the results achieved:

# run container
$ docker run --memory=10g -p 80:5000 -it --entrypoint=/bin/bash recommendation-api
...
# switch related_articles to False
$ vi recommendation/data/recommendation.ini
...
# start application
$ uwsgi --http :5000 --wsgi-file recommendation/data/recommendation.wsgi
...
# query api in new terminal
$ curl 'http://127.0.0.1/api/?s=en&t=fr&n=3&article=Apple'
[{"title": "Ceratotheca_sesamoides", "wikidata_id": "Q16751030", "rank": 499.0, "pageviews": 133}, {"title": "Annona_montana", "wikidata_id": "Q311448", "rank": 498.0, "pageviews": 601}, {"title": "Treculia_africana", "wikidata_id": "Q2017779", "rank": 496.0, "pageviews": 308}]

Change 958971 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/recommendation-api@master] switch off related_articles endpoint for the LiftWing instance

https://gerrit.wikimedia.org/r/958971

Change 958971 merged by jenkins-bot:

[research/recommendation-api@master] switch off related_articles endpoint for the LiftWing instance

https://gerrit.wikimedia.org/r/958971

I have tested the recommendation-api with related_articles switched off and it ran without the errors.

Great to hear! Thanks!

Change 956017 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update recommendation-api-ng image

https://gerrit.wikimedia.org/r/956017

calbon triaged this task as Medium priority.Nov 2 2023, 7:25 PM