Page MenuHomePhabricator

Run NLLB-200 model in a new instance
Closed, ResolvedPublic

Description

As part of the project No Language Left Behind from Meta, the NLLB-200 neural machine translation models (previously named Flores) have been released with an open source license. The model is currently available in Content Translation for a set of 23 languages (T307970) including several historically underserved languages that are not supported by other translation services such as Swati and Tswana. Based on a recent report that analyzes the different translation services available, data suggests the translation quality provided by the NLLB-200 model is good:

Overall across all languages, NLLB-200 currently has the lowest percent of articles created with content translation that are deleted (0.13%) compared to all other MT services available, while it has the highest percent of translations modified under 10%, indicating that the modifications rates for this machine translations service are a signal of good machine translation quality.

This is consistent with requests from communities such as Igbo and Icelandic to use the service as default over the current alternatives. The unique position of NLLB-200 as open source, good quality, and high number of languages supported at the same time, makes it a key resource to better support access to knowledge by anyone regardless the language they speak. This ticket proposes to create an instance to run this model for the currently supported and more languages.

Current status, opportunities and challenges

Currently, the model is accessed as an external service using an API that the research team at Meta provided to test the models (more details). Creating our own instance to run the model will allow to:

  • Reduce dependencies on the current service provided by Meta, making sure it is available for the long term.
  • Reduce dependencies on the external services available such as Google and Yandex which cover hundreds of languages not supported by the existing opensource systems available such as Apertium.
  • Support more languages. The model provides support for 200, but only 23 languages are exposed through the current API. Supporting more languages with machine translaiton is often requested by the Wikimedia communities (T86700) requests for languages that are supported by the NLLB-200 model but not available in the current API have been requested (e.g., Santali).
  • Expand the use of machine translation to other Wikimedia products (e.g., multilingual talk pages).

Hosting and running the model may present some challenges. Based on previous analysis, in order to obtain the needed level of performance, GPU acceleration (Nvidia-based in particular) is needed to run the models. Given that the drivers to access those GPUs are not yet open source, we may need to explore ways to avoid technical issues due to potential vulnerabilities the lack of fully open hardware not to prevent supporting anyone to access knowledge in any language.

As with any other machine translation service, the current NLLB-200 API is integrated in a way that only publicly available Wikipedia content is exchanged without any user personal information. Only the returned translated content (which is sanitized) is consumed back. The future system can work as isolated as needed from the rest of the infrastructure.

Event Timeline

LSobanski subscribed.

I don't see a specific ask for SRE so removing the tag. Please add it back when needed.

@LSobanski this is the first example of AWS microservice built outside our production realm, I asked to open a task to SRE to discuss how it is best to proceed and what standards we should aim for. The idea is to use this project as pilot but with some guardrails, lemme know your thoughts :)

The microservice will also use Nvidia GPUs on AWS, that we currently don't allow in production due to their non opensource drivers etc.. This is another thing to discuss since we will not be able to move the microservice to prod if Nvidia GPUs will hold as requirement.

Thanks for the clarification. Let's start with serviceops then and see who else we need afterwards.

I am moving this ticket to ML's in progress column. @klausman I spoke to Deb. It sounds like the plan currently is for you to do the actual migration of NLLB from Meta's AWS to our AWS. Based on Deb's comments it seems like everything is set for you to move forward. If you have questions Prabhat Tiwary on the Enterprise team is available for questions and consulting.

@LSobanski this is the first example of AWS microservice built outside our production realm, I asked to open a task to SRE to discuss how it is best to proceed and what standards we should aim for. The idea is to use this project as pilot but with some guardrails, lemme know your thoughts :)

From the production's side, I think we have quite a bit of precedence on this once. cxserver is using multiple backends (e.g. Yandex, Google Translate, etc) for fetching translations (requests go via the url-downloader proxy servers). If my understanding is correct, this is the pattern we are using already for the Meta hosted microservice and more or less the same model will be used the microservice that will be hosted in AWS in our own account. So, as far as production infra goes, we are already covered.

The big question (and a field I am not sure I want to open, but here goes anyway) is how that AWS infrastructure that has the microservice in there is going to be managed, how the service will be deployed etc. I see 0 mention or links to any of that in this task.

So, org wise, is this a green field for us? (no, I don't mean individual's knowledge and experience). There is some experience and knowledge from Enterprise, but I don't think it has permeated the rest of the org (but from Chris's comment knowledge spreading can start)

Also, last I checked there aren't any org wide established and accepted best practices for how to manage infrastructure in AWS. It is a considerable investment on our side to get some, even if we adopt in bulk established ones from the rest of the industry (e.g. use Lambda this way, use terraform for VMs or more generically use X for Y the Z way).

I think the pilot should be used to adopt/build the above, but I don't have any concrete proposals, aside from reach out to Enterprise and evaluate already established best practices.

The microservice will also use Nvidia GPUs on AWS, that we currently don't allow in production due to their non opensource drivers etc.. This is another thing to discuss since we will not be able to move the microservice to prod if Nvidia GPUs will hold as requirement.

There have been some developments in NVIDIA's front but it will be long before all the necessary components are open source for this, if they ever are. Unless either:

  • AMD GPUs become an option for the software
  • We accept awful CPU performance and implement some architecture to hide it (not sure this is even possible, my gut says no)
  • We become more relaxed on the open source commitment

it will be a long while (if ever) before we are able to move the microservice to our infrastructure.

@LSobanski, @elukey, I am gonna remove serviceops, I don't see aside from some best practices review what we can do more about this.

Change 864892 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Flores/NLLB-200: Switch to WMF managed MT service

https://gerrit.wikimedia.org/r/864892

Change 864892 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Flores/NLLB-200: Switch to WMF managed MT service

https://gerrit.wikimedia.org/r/864892

Change 865063 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2022-12-06-121330-production

https://gerrit.wikimedia.org/r/865063

Change 865063 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2022-12-06-121330-production

https://gerrit.wikimedia.org/r/865063

Mentioned in SAL (#wikimedia-operations) [2022-12-13T06:59:12Z] <kart_> Updated cxserver to 2022-12-06-121330-production (T321781, T324534)