Page MenuHomePhabricator

Hypothesis 5.1.8 (FY24-25 Q2): Enable internationalisation on generated OpenAPI specifications for MediaWiki REST API
Open, In Progress, HighPublic

Description

Description

We have started generating OpenAPI specifications for the MediaWiki REST API. These specs will be used to streamline documentation management by reducing the dependency on humans to manually update the documentation when code changes occur. Utilizing industry standard OpenAPI definitions also unlocks new developer experiences (eg: SwaggerUI interactive documentation portal).

In addition to spec generation making it easier for API publishers to manage their APIs, we would also like to make the APIs themselves more accessible to wider international audiences. Wikimedia has robust translation tooling, powered by the TranslateWiki community. Leveraging these workflows and community support will allow us to automate our translation processes and allow developers to read the documentation in the default language of their project.

Problem Statement

Language accessibility is a key pillar of the Wikimedia mission, yet Swagger UI does not natively support client-side language translation. We need a custom solution as part of our OpenAPI spec generation that will allow us to perform automated content translation as part of spec generation. This solution must also be scalable and approachable for API publishers both in and outside of the foundation, including those who extend MediaWiki beyond its core capabilities.

Conditions of acceptance

  • Define a hydration mechanism and definition convention for representing translation keys in the OpenAPI specs and JSON schemas.
  • Experiment with creating a utility that will generate and inject translation keys automatically as part of the backend spec generation.
  • Insert translation keys into and for all relevant specs.
  • Launch Special:RESTSandbox onto at least 5 project wikis.
  • Socialize our work to broader Wikimedia teams, so they can follow a similar pattern for translation if desired.

Out of scope

Although SwaggerUI is an open source project that we could contribute language support back to, we decided to tackle the Wikimedia use cases first. We may still consider contributing language support more directly through an open source library and/or direct contributions to the Swagger UI project. At minimum, we will highlight the work through tech announcements, and possibly a white paper that can be referenced outside of Wikimedia as an option for automated translation.

Event Timeline

Change #1078506 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] REST: Allow specifying param descriptions as MessageValue objects

https://gerrit.wikimedia.org/r/1078506

BPirkle changed the task status from Open to In Progress.Oct 10 2024, 3:15 PM

Change #1078506 merged by jenkins-bot:

[mediawiki/core@master] REST: Allow specifying param descriptions as MessageValue objects

https://gerrit.wikimedia.org/r/1078506

Change #1087857 had a related patch set uploaded (by Atieno; author: Atieno):

[mediawiki/core@master] REST: Allow specifying param descriptions as MessageValue objects

https://gerrit.wikimedia.org/r/1087857

Thanks, @Atieno , for posting that change! This is a good time to talk about exactly what conventions we want to establish going forward. I'm posting this here, on the phab task, rather than on the gerrit change, because this is probably an easier place for @KBach to share thoughts, and also a more visible decision record.

For some context for Kamil, what we're currently doing is making the "description" strings in the MediaWiki REST API spec translatable (and adding descriptions where they're missing). These descriptions will appear in the REST Sandbox (currently available only on test and beta wikis), powered by the actual spec. Here is a current change where we're doing this for a few of the parameters.

An open question is, what patterns should we use for naming message keys? This has some implications going forward.

Specifically, in the original change making translation possible, we used the message key "rest-param-desc-revision-id" for the one description in that change. In the (as of this writing unmerged) change under review, we used message keys like "rest-param-desc-html-output-flavor". The subtle difference is the "html-output" bit in the second change.

The more generic naming in the first change makes the message key (and associated translation strings) reusable in any other endpoint that has a revision id parameter. By including "html-output" in the message key for the second change, we're making it specific to that one API endpoint. With the first approach, we won't need as many message keys. Future developers are more likely to be able to reuse existing translated strings rather than always needing to define new ones for new endpoints they create. But that reusability forces us to use very generic translation strings, which don't communicate much meaning and end up being basically being just a repetition of the parameter name. The second approach makes reuse unlikely, but allows us to use more specific description strings that may be more helpful to clients calling the API endpoints.

We can, of course, mix those conventions - use a generic message key for parameters that we don't have anything special to say about, and use a specific message key for parameters that might need a little more elaboration for a certain endpoint. Would that be flexible and helpful, or would it be inconsistent and confusing?

Last thought: we'll also be defining translation strings for request and response bodies. Should we be even more generic (ex. "rest-desc-revision-id") such that we can reuse message keys across all those use cases? Or is path/query vs request body vs response body a more useful granularity? Or maybe some combination of conventions that we use situationally?

I should also mention that there's no real performance consequence for having a bunch of message keys. Adding ten thousand might be worrisome. The more likely situation where we add a few dozen or even a few hundred shouldn't be a big deal.

Change #1087982 had a related patch set uploaded (by Fgoodwin; author: Fgoodwin):

[mediawiki/core@master] T376493: Specify REST parameters as MessageValue objects

https://gerrit.wikimedia.org/r/1087982

Following up on the conversation we had with @KBach this morning -- it sounded like we settled on using the more specific key definitions. Daniel also pointed out that we can make the values identical across keys using this type of formatting: {{Identical|Xyz}} in qqq.json. So, let's assume that we will be more specific in the key generation. Perhaps we follow a similar pattern to what extensions do to ensure unique definitions, where we add a distinct prefix for this API? We're kind of doing that with rest, but it might be beneficial to do something like rest-{module}-{endpoint}-{paramType}-{paramName} or something like that? I don't want it to be way over the top with how robust it is, but would like to make sure that we're at least internally consistent.

We also discussed potentially creating some kind of tooling to ensure the naming conventions are consistent across contributors and APIs -- we can create a follow up research task for that.

In addition to the keys, I would also like to make sure we are using high quality description values. Looking through the PRs, I see that we are effectively repeating the name of the parameter as the description. For example, simply using the description of "Revision ID" for the parameter revision_id.

For the 'en' file values, let's be sure to reference the 'tech writer approved' strings that are currently present in the API portal: https://api.wikimedia.org/wiki/Core_REST_API

Change #1087857 merged by jenkins-bot:

[mediawiki/core@master] REST: Allow specifying param descriptions as MessageValue objects

https://gerrit.wikimedia.org/r/1087857

In addition to the keys, I would also like to make sure we are using high quality description values. Looking through the PRs, I see that we are effectively repeating the name of the parameter as the description. For example, simply using the description of "Revision ID" for the parameter revision_id.

For the 'en' file values, let's be sure to reference the 'tech writer approved' strings that are currently present in the API portal: https://api.wikimedia.org/wiki/Core_REST_API

Very much agree.

Also, we should start out with {notranslate} on all the qqq entries. This gives us the opportunity to make sure we're happy with this work as a whole before releasing the strings to translators. This includes giving us a chance to circle back and replace the current less helpful strings with the tech writer versions before using translator time/effort.

Change #1091913 had a related patch set uploaded (by Atieno; author: Atieno):

[mediawiki/core@master] REST: Allow specifying param descriptions as MessageValue objects

https://gerrit.wikimedia.org/r/1091913