Page MenuHomePhabricator

Containerize Content Translation Recommendation API
Open, MediumPublic

Description

The goal of this task is to containerize the Flask web application that runs the Content Translation Recommendation API as a first step in the process of migrating it to Lift Wing.

Event Timeline

We've been able to wrangle the dependencies (1, 2) and configurations (uWSGI, ngix, systemd, etc) that were set in 2016 and got the Recommendations API into a container but when the application is started it returns the errors below:

*** Operational MODE: single process ***
unable to load configuration from from multiprocessing.semaphore_tracker import main;main(8)
2023-06-12 14:40:39,005 recommendation.api.types.related_articles.candidate_finder initialize():97 INFO -- starting to load embedding
Traceback (most recent call last):
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 112, in load_raw_embedding
    f = open(path, 'r', encoding='utf-8')
FileNotFoundError: [Errno 2] No such file or directory: '/etc/recommendation/mini_embedding'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/pkg_resources/__init__.py", line 359, in get_provider
    module = sys.modules[moduleOrReq]
KeyError: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "recommendation/data/recommendation.wsgi", line 26, in <module>
    candidate_finder.initialize_embedding()
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 79, in initialize_embedding
    _embedding.initialize(embedding_path, embedding_package, embedding_name, optimize, optimized_embedding_path)
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 102, in initialize
    self.load_raw_embedding(path, package, name)
  File "./recommendation/api/types/related_articles/candidate_finder.py", line 114, in load_raw_embedding
    f = open(resource_filename(package, name), 'r', encoding='utf-8')
  File "/usr/local/lib/python3.4/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
    return get_provider(package_or_requirement).get_resource_filename(
  File "/usr/local/lib/python3.4/site-packages/pkg_resources/__init__.py", line 361, in get_provider
    __import__(moduleOrReq)
ValueError: Empty module name
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***

The application expects to load embeddings, we have looked through the repo and there are no embeddings stored in it.

Going to investigate further to find where the embeddings are located.

The documentation (1, 2) doesn't specify where the embeddings are stored. We read the source code (1, 2), downloaded the embedding, optimized and saved it in the location where the application expects to load it from.

We are now able to hit the recommendation-api endpoint that is hosted locally in a container and return results as shown below:

$ curl 'http://127.0.0.1/api/?s=en&t=fr&n=3&article=Apple'

[{"pageviews": 71, "title": "York_Imperial", "wikidata_id": "Q8055480", "rank": 497.0}, {"pageviews": 71, "title": "Lawrence_Ogilvie", "wikidata_id": "Q6504438", "rank": 495.0}, {"pageviews": 604, "title": "African_nightshade", "wikidata_id": "Q16860982", "rank": 493.0}]

Even though we were able to get the api to work, the front-end files are returning 404 (NOT FOUND) errors in the browser. The next step is to figure out why the front-end is not loading as expected.

@kevinbazira I have a generic question about the python repo, nothing urgent but I'd like to know your thoughts. We are working on Fast API for ores-legacy, and most of the team is getting used to it. It seems also to work nicely performance-wise, and we have written a deployment-charts chart that can (in theory) run any docker image with a fast-api app. Flask is nice but very slow, and it doesn't allow async calls afaics.. I am wondering if we could also think about moving the codebase to fastapi, or if it is something too big for the moment. Of course only after the Flask app runs fine locally etc.. :)

@kevinbazira I have a generic question about the python repo, nothing urgent but I'd like to know your thoughts. We are working on Fast API for ores-legacy, and most of the team is getting used to it. It seems also to work nicely performance-wise, and we have written a deployment-charts chart that can (in theory) run any docker image with a fast-api app. Flask is nice but very slow, and it doesn't allow async calls afaics.. I am wondering if we could also think about moving the codebase to fastapi, or if it is something too big for the moment. Of course only after the Flask app runs fine locally etc.. :)

@elukey migrating the recommendation-api codebase from Flask to FastAPI is a good idea. However, this would be equivalent to rebuilding the entire project which I don't think is the priority at the moment.

Based on @calbon's suggestion, we first migrate the recommendation-api to LiftWing as is then we can make improvements afterwards.

@kevinbazira yep I agree, but we'd need to create a lot of scaffolding in deployment-charts to run Flask, to then migrate to Fast API, so extra work will be needed anyway. What I wondered was if we could scope the migration and see how big it is, because if feasible we could do it now and re-use the existing scaffolding for fastapi-apps in production (plus we wouldn't run any Flask app that we know doesn't perform well).

@kevinbazira yep I agree, but we'd need to create a lot of scaffolding in deployment-charts to run Flask, to then migrate to Fast API, so extra work will be needed anyway. What I wondered was if we could scope the migration and see how big it is, because if feasible we could do it now and re-use the existing scaffolding for fastapi-apps in production (plus we wouldn't run any Flask app that we know doesn't perform well).

It is good that we have an existing scaffolding for FastAPI apps. The recommendation-api project is a good example of projects we are likely to be handed to host on LiftWing in the future and they will run on different technologies other than FastAPI. (e.g Flask, Django, Laravel, Node, etc) It will not be feasible to rebuild every app to be hosted on LiftWing to use FastAPI. We might want to prepare and be flexible to host various technologies and aim to optimize them accordingly.

For this project, @calbon and I had a discussion and agreed that the approach we should take is to first migrate the
recommendation-api to LiftWing as is then make improvements afterwards. This is the workflow we are following.

Sure, I am fine with the approach, the only thing that I asked earlier on was if you had thoughts/time to figure out how long would it take to migrate to fastapi (if even possible), to establish a prioritization about what to do. The idea is to find if the work is around some days or more, so that we can schedule it now or in the future.

We can definitely create new scaffolding for Flask in deployment-chart, should be fine, but I am 100% convinced that we should standardize our services as much as possible :)

Sure, I am fine with the approach, the only thing that I asked earlier on was if you had thoughts/time to figure out how long would it take to migrate to fastapi (if even possible), to establish a prioritization about what to do. The idea is to find if the work is around some days or more, so that we can schedule it now or in the future.

We can definitely create new scaffolding for Flask in deployment-chart, should be fine, but I am 100% convinced that we should standardize our services as much as possible :)

Rebuilding a project of this scale using a different framework requires careful planning as we would have to rethink the implementation architecture to keep the current app functionality while catering to framework-independent features. This would not take a few days. Depending on the resources we have on the team, it would take upwards of months to prepare design docs, plan the implementation roadmap, rebuild, test and deploy the recommendation-api using FastAPI.

Totally get your point but I don't agree 100%, in this case we don't really need a complete design doc nor roadmaps, it would just be moving the API from Flask to fast-api and uvicorn. I am not forcing the team to do it now, it is just an exercise to figure out how long these tasks would take. Sounds like my idea is not good :)

Anyway, let's proceed with Flask, even if I already anticipate that we'll have performance problems when more clients will hit the API (since Flask + Python + blocking calls is not a good recipe for a modern service in my opinion). We'll need to alert whoever wants to use it that we (as ML team) will not be responsible if the service can't scale up as people want :)

...
Even though we were able to get the api to work, the front-end files are returning 404 (NOT FOUND) errors in the browser. The next step is to figure out why the front-end is not loading as expected.

Managed to install front-end dependencies that rely on bower (an old package manager) then pointed the front-end resources path to bower components. The 404 (NOT FOUND) errors are now gone. Both the recommendation-api endpoint and front-end are running successfully in a locally hosted container as shown below:

1.Recommendation-api Endpoint

Recommendation-api Endpoint - Screenshot from 2023-06-15 18-40-27.png (741×1 px, 54 KB)

2.Recommendation-api Frontend (GapFinder)

Recommendation-api FrontEnd (GapFinder) - Screenshot from 2023-06-15 18-40-40.png (741×1 px, 516 KB)

Next step is to figure out how to host this container on LiftWing.

@kevinbazira some info in https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#Migrating_a_service_to_Kubernetes

The first step is probably to create a blubber file that can build the Docker image, and then add the CI config to publish it to our Docker registry. Then we can work on a helm chart for the recommendation-api service :)

@kevinbazira do we also need to host the GapFinder frontend? I would personally only keep the API, and let others to maintain/build UIs (so we don't need bower etc.. in our code). It was good that you were able to make it work, so we know that the whole thing is running fine, but I'd personally only add to the docker image only the bits related to the api. Lemme know your thoughts :)

@kevinbazira do we also need to host the GapFinder frontend? I would personally only keep the API, and let others to maintain/build UIs (so we don't need bower etc.. in our code). It was good that you were able to make it work, so we know that the whole thing is running fine, but I'd personally only add to the docker image only the bits related to the api. Lemme know your thoughts :)

The ML team agreed to migrate the recommendation-api to LiftWing, this application has both a backend (API) and a frontend (GapFinder). It makes sense for us to host both of them. When we are done with the migration and point users to the new URL they expect feature parity.

I am strongly against migrating the front-end, we should not do it in my opinion. It is not the idea of a service, since the UI can be easily built elsewhere (for example, reuse the wmflabs one etc..). Maintaining UIs should not be what we do, let's discuss it during the team meeting (please others comment if you have ideas and opinions).

This is also in line with our policy of exposing APIs, namely we use the API-Gateway and we don't expose anything directly to the outside Internet (as we should do for GapFinder for example). The goal should be to support Content Translation (so they can use the production API directly and not the wmflabs), the GapFinder use case is probably secondary (but we should try to expose an API for it in the future).

To me it would seem better to host the UI in some other place (e.g. wmcoud) since Lift Wing is a platform for backend services. That way we could have rate limiting through the api gateway etc without worrying about what frontend traffic does to our cluster.

calbon triaged this task as Medium priority.Nov 2 2023, 7:25 PM