Page MenuHomePhabricator

Provide a convenient way to obtain localized error messages in the JS REST API framework
Open, Needs TriagePublic

Description

The JavaScript REST API framework (rest.js) provides some methods to easily make requests, but it doesn't handle responses, and error messages in particular. If there's an error, the REST API returns all error messages already translated, in the messageTranslations key, for instance:

"messageTranslations": {
    "en": "A message in English"
    "fr": "Un message en Français"
}

Right now the languages are always English and the content language, but I assume this should change at some point (T269492). This means that if some code wants to display a localized error message, they have to do something like:

msg = apiResponse.messageTranslations[ mw.config.get( 'wgContentLanguage' ) ]

which is subpar because :

  1. It depends on an implementation detail (the REST API using the content language) that could change in the future
  2. It seems unnecessarily long
  3. It wouldn't scale if the list of languages is changed (e.g., to use the Accept-Language header or some MW-specific header or parameter)

Instead, I would like rest.js to choose the appropriate language, based on whatever criteria the REST API framework uses. This could happen either transparently by changing the return value of ajax() before the caller sees it, or it could be a helper method. It could also either just return the most appropriate translation, or a sorted list of translations.

Event Timeline

Hello @BPirkle. This is currently a blocker for the Campaigns team as we work on building a registration tool. Would it be possible for you or someone on the API Platform team to check the ticket out? Thank you in advance!

Hello @BPirkle. This is currently a blocker for the Campaigns team as we work on building a registration tool. Would it be possible for you or someone on the API Platform team to check the ticket out? Thank you in advance!

(Noting that this is also true for T269492)

Regarding priority, is there a specific time frame in which you need this resolved? Also, how hard a blocker is it for you? As mentioned in the task description, there is a workaround, although I agree it is non-optimal.

I'm not trying to avoid looking at this, just figuring out where to put it relative to other tasks.

@BPirkle Hello, and thank you for your response! As a response, we would ideally like T269492 (as higher priority) and then this task (T311423) completed by September 1, 2022. Would that be possible?

As for the workaround, we agree that it's not optimal (but thanks for bringing it up!). Once the tasks mentioned above are resolved, the workaround will no longer be valid. For this reason, we look forward to hearing your thoughts on T269492 and T311423. Thanks in advance!

Ok, great. September 1 is a very comfortable date.

I'm out of office most of next week, so wanted to be sure I didn't need to jump on this before then (or be prepared to jump on it immediately after). If I haven't done something on this by the end of July, please remind me.

Thank you so much, @BPirkle -- that's great news!

Pinging @ldelench_wmf so we remember to check in on this task at the end of July.

tl;dr: the more I look at this, the harder it looks. I'm curious if we can reach an acceptable solution via relatively small changes on both the application and server side.

We discussed (to my recollection) the following possibilities in a synchronous call:

  1. return the message index, allowing the client to obtain its own error message in the language of its choice
  2. sort the messageTranslations array in the response, prioritizing the most desirable

Please correct me or add additional possibilities that I may have missed.

Regarding #1, including the message index, @Daimona correctly points out in this comment on T269492: Selecting user language in the REST API that the index is of dubious value given parameter/formatting concerns. So on second thought, I'm disinclined to pursue that option.

Regarding #2 (sorting the message Translations array), it seems that, whether intentional or not, the current implementation always places the error message in the content language as the first item in the array, with the English version as the second item (if the content language is English, the array will have only one element). See EntryPoint::getTextFormatters(). To be clear, that means that part of the task description is technically incorrect. The task description says:

"messageTranslations": {
    "en": "A message in English"
    "fr": "Un message en Français"
}

However, it should actually say:

"messageTranslations": {
    "fr": "Un message en Français",
    "en": "A message in English"
}

You can see this at a url like: https://de.wikipedia.org/w/rest.php/v1/page/as,djfasdkj/with_html

(Please correct me if I'm missing something and there are cases were "en" can come before a content language that is not "en".)

We could make this an explicit part of the REST API contract (I guess just via code comments? Not sure there's an existing formal spec to modify...) and call that done. For bonus points, we could add a helper function to rest.js that returns a translated error message. By default (no language code specified) it would return the first message in the array. If the caller supplied a language code, it would return the error message by that code, or an error value if there were no such translation. That way, callers with no specific needs could just accept the default "best" translation, and callers with specific needs would have the option of more specificity.

However, I see this issue description in a related Google Doc:

When the user is on the EventDetails page and they try to move a participant, the user may see an error (for example, a server error, connection issue, or a data-related error). In this case, a message occurs in a pop-up. The message will not be in the correct language because we get that information when we reload the page (and the error message will not require a reloading of the page). So the error message in this case is in English or the content language rather than the language preference of the user.

I'm not familiar with the application details. But if I'm understanding correctly, this means that, at least in this case, the desire is to have the error message in the user preference language rather than the content language. And right now, we're not guaranteed to supply a message translation in the user preference language.

Did I understand the bolded paragraph correctly?

If I did, this unfortunately runs counter to the cacheability recommendations being discussed in T269492: Selecting user language in the REST API, which recommend avoiding the response depending on the user preference. And as @Tgr mentioned in this comment, 4xx error responses may be cached. I wouldn't object to an individual endpoint depending on user preference (in practice, I suspect a lot of them probably will) but I'm very reluctant to add that dependence at a REST infrastructure level.

This leaves me struggling for a solution, and I'd be happy to hear more ideas.

Is there any possibility of an application-level solution? I'm curious about the application level (i.e. endpoint handler) usage of exceptions, what is being considered a 4xx/5xx level error vs an application-level error, and what the display requirements are. Some things might be communicated back to the caller as 200s, but with an error response in the body (thereby distinguishing an application-level error from a server-level exception.) You're probably already doing that, but I mention it for completeness. And the bolded paragraph above mentions things like "connection error" or "server error", which indeed sound like true exceptions. However, I hope they're also relatively rare. Is there a possibility of displaying something like this to the user:

Operation could not be completed
Details: <string from the server>

In this case, everything except <string from the server> would be in the UX language (which presumably the client is already dealing with for other parts of the UX) and the <string from the server> portion would be taken from the messageTranslations array by the aforementioned rest.js helper function. So only that part would appear in the content language. But I'm guessing other things are also being shown in the content language, so there's a fair chance the user would be able to understand it. And worst case, the user is still informed about the error, just not in an ideal way. I don't want to be dismissive about the user experience, but I also don't want to let perfect be the enemy of good.

So to summarize, my proposal is:

  1. make sorting the messageTranslations array to include the content language as the first element be part of the REST API's contract
  2. add a helper function to rest.js to get the translated error message, defaulting to the first element but accepting an optional language parameter
  3. adjust client error display to always show a hard-coded generic message in whatever UX language the client is using, and also include the "best" translation from the server

This would not protect against future changes to the messageTranslation sorting, should anything being discussed in T269492: Selecting user language in the REST API impact that. However, even if this results in a change to client behavior, it would not be a breaking change (as in, the client would still continue to function and display a string, even if it is a different string).

Would this be acceptable? If not, what might be?

To be clear, that means that part of the task description is technically incorrect. The task description says: [...] However, it should actually say: [...] (Please correct me if I'm missing something and there are cases were "en" can come before a content language that is not "en".)

Oh, you are right, I guess I didn't use a real response. I also didn't notice the ordering, or maybe I just thought that the order was random, and not intentional.

We could make this an explicit part of the REST API contract (I guess just via code comments? Not sure there's an existing formal spec to modify...) and call that done. For bonus points, we could add a helper function to rest.js that returns a translated error message. By default (no language code specified) it would return the first message in the array. If the caller supplied a language code, it would return the error message by that code, or an error value if there were no such translation. That way, callers with no specific needs could just accept the default "best" translation, and callers with specific needs would have the option of more specificity.

I think this is pretty much what I was thinking of when I created this task. More broadly, on the server side you would change the contract so that each language is weighed (sorting is a way to accomplish that). On the rest.js side, you would just have a helper method that picks the most relevant language, and whose implementation depends on how the weights are implemented. Your proposal is a possible implementation of these ideas.

However, I see this issue description in a related Google Doc:

When the user is on the EventDetails page and they try to move a participant, the user may see an error (for example, a server error, connection issue, or a data-related error). In this case, a message occurs in a pop-up. The message will not be in the correct language because we get that information when we reload the page (and the error message will not require a reloading of the page). So the error message in this case is in English or the content language rather than the language preference of the user.

I'm not familiar with the application details. But if I'm understanding correctly, this means that, at least in this case, the desire is to have the error message in the user preference language rather than the content language. And right now, we're not guaranteed to supply a message translation in the user preference language.

Did I understand the bolded paragraph correctly?

I'm not sure where that paragraph is coming from, but to me that seems like a simplified non-technical explanation. If I were to reword the above in more technical terms, I could say something like: users can click a button to remove participants from an event. When they click the button, we make a REST API request, and if that request fails (for whatever reason, could be a 4xx or 5xx kind of error), we would like to report that error to the user in their preferred language. So yes, I believe you understood it correctly.

If I did, this unfortunately runs counter to the cacheability recommendations being discussed in T269492: Selecting user language in the REST API, which recommend avoiding the response depending on the user preference. And as @Tgr mentioned in this comment, 4xx error responses may be cached. I wouldn't object to an individual endpoint depending on user preference (in practice, I suspect a lot of them probably will) but I'm very reluctant to add that dependence at a REST infrastructure level.

I don't think this is necessarily an issue. As we were discussing in T269492, we don't have to add a "use the user's preferred language" option. We could make the API only accept language codes, and the client code could pass the appropriate language. In rest.js, this could be implemented directly in ajax() (maybe behind a boolean switch?), so that callers don't even need to worry about it.

Is there any possibility of an application-level solution? I'm curious about the application level (i.e. endpoint handler) usage of exceptions, what is being considered a 4xx/5xx level error vs an application-level error, and what the display requirements are. Some things might be communicated back to the caller as 200s, but with an error response in the body (thereby distinguishing an application-level error from a server-level exception.)

I'm not a fan of sending a 200 response with errors inside. Unless we're talking about warnings (or other kinds of errors that do not prevent the operation from completing), that seems conceptually wrong to me. And unfortunately, none of our endpoints has such kind of "soft errors". You can find the API documentation here. As you can see, every error happens in a 40x response.

So to summarize, my proposal is:

  1. make sorting the messageTranslations array to include the content language as the first element be part of the REST API's contract
  2. add a helper function to rest.js to get the translated error message, defaulting to the first element but accepting an optional language parameter

LGTM

  1. adjust client error display to always show a hard-coded generic message in whatever UX language the client is using, and also include the "best" translation from the server

I'm not sure if I understand this.

This would not protect against future changes to the messageTranslation sorting, should anything being discussed in T269492: Selecting user language in the REST API impact that. However, even if this results in a change to client behavior, it would not be a breaking change (as in, the client would still continue to function and display a string, even if it is a different string).

Would this be acceptable? If not, what might be?

Seems acceptable overall, and quite close to what I had in mind.

  1. adjust client error display to always show a hard-coded generic message in whatever UX language the client is using, and also include the "best" translation from the server

I'm not sure if I understand this.

That was my thought about how the client might handle error message display without a guarantee of an error in the user preference language. How the client actually handles it is, of course, up to your team.

To expand on that a bit for clarity: in EntryPoint::getTextFormatters(), the server-side code will know the content language, but not the user preference language. The user preference language would be obtainable, but doing anything with it would mean the response depends on the user, which we're trying to avoid at this level.

If individual handlers want/need to make the response dependent on user preferences, that's fine. That's what endpoints that use authentication usually do. But these endpoint can have knowledge of the fact that they're doing it, and can include appropriate cache control directives. And the caching implications of doing that are restricted to that particular endpoint. Code in EntryPoint isn't well-positioned to be that context-sensitive, and caching implications apply to all endpoints.

Rather than using the actual user preference (from the database), we could take it from Accept-Language, and Vary on that. As discussed in T269492: Selecting user language in the REST API, Accept-Language is problematic to Vary on, because it can cause cache fragmentation, but it might be possible to mitigate that by normalizing it at the edge cache level. However, that discussion is still ongoing and I'm not comfortable assuming where it will end up. In particular, I've had an initial conversation with Traffic, who has agreed to review the task and contribute any thoughts they may have.

So for right now, the only two languages that EntryPoint::getTextFormatters() knows about are the content language and the hard-coded fallback of English. This means that if the client really wanted the error message translated into the user preference language (and the user preference language was different than either of those two), the client, at least for anything we implement this week, isn't going to get it.

With all that in mind, my suggestion was that the client show the user some generic message like "Operation could not be completed" (or whatever wording you prefer) in its normal UX language (which the user will be able to read), and also display the error string from the server (which the user will probably be able to read, but there's a chance they won't be able to).

That's a lot of words, probably too many. But I want to be clear about what the client will and won't be getting under this proposal, just so there are no surprises.

Did that help, or did it increase confusion?

Edit: I mistakenly pasted T264777 when I meant to paste T269492. I've corrected that above.

That was my thought about how the client might handle error message display without a guarantee of an error in the user preference language. How the client actually handles it is, of course, up to your team.

Got it, thank you. I think in that case it's fine to use English, just like it happens in the interface when a message has not been translated in your language.

To expand on that a bit for clarity: in EntryPoint::getTextFormatters(), the server-side code will know the content language, but not the user preference language. The user preference language would be obtainable, but doing anything with it would mean the response depends on the user, which we're trying to avoid at this level.

Rather than using the actual user preference (from the database), we could take it from Accept-Language, and Vary on that. As discussed in T264777: Include error message translations in the user language in the REST API's error response, Accept-Language is problematic to Vary on, because it can cause cache fragmentation, but it might be possible to mitigate that by normalizing it at the edge cache level. However, that discussion is still ongoing and I'm not comfortable assuming where it will end up. In particular, I've had an initial conversation with Traffic, who has agreed to review the task and contribute any thoughts they may have.

I'm not sure about the caching implications or the ongoing discussion, but I thought for now the user's preferred language could be treated the same as every other language? That is, instead of passing a special value to make it use the preference, you could just retrieve the preference on the client side and put it in Accept-Language (or whatever method we want to use). The server wouldn't even know that the value comes from the preferences, and could handle it like it would handle any language specified in Accept-Language.

With all that in mind, my suggestion was that the client show the user some generic message like "Operation could not be completed" (or whatever wording you prefer) in its normal UX language (which the user will be able to read), and also display the error string from the server (which the user will probably be able to read, but there's a chance they won't be able to).

I think we're already doing that as a temporary solution, this task was for the longer-term solution of actually translating the message.

Did that help, or did it increase confusion?

It definitely helps, thank you!

Rather than using the actual user preference (from the database), we could take it from Accept-Language, and Vary on that. As discussed in T269492: Selecting user language in the REST API, Accept-Language is problematic to Vary on, because it can cause cache fragmentation, but it might be possible to mitigate that by normalizing it at the edge cache level. However, that discussion is still ongoing and I'm not comfortable assuming where it will end up. In particular, I've had an initial conversation with Traffic, who has agreed to review the task and contribute any thoughts they may have.

I'm not sure about the caching implications or the ongoing discussion, but I thought for now the user's preferred language could be treated the same as every other language? That is, instead of passing a special value to make it use the preference, you could just retrieve the preference on the client side and put it in Accept-Language (or whatever method we want to use). The server wouldn't even know that the value comes from the preferences, and could handle it like it would handle any language specified in Accept-Language.

Yep. The only part of that I disagree with is "for now", because we don't yet have a solid decision on what mechanism we want to use on an API-wide basis.

If it were the individual handler pulling that preference from Accept-Language (or wherever), that'd be fine. But because we're talking about the REST infrastructure code needing to do this in EntryPoint, all endpoints would potentially be affected. I don't want to introduce Accept-Language (or any other API-wide mechanism) at that low of a level until the discussion in T269492: Selecting user language in the REST API is completed.

Unless someone has other suggestions, I think our options at the moment are:

  1. proceed with the proposal above, with the understanding that there may not be a translation for the user preference language
  2. wait for T269492: Selecting user language in the REST API to wrap up before we do anything

Finally, I just realized I pasted the wrong ticket number in the comment you quoted. I meant T269492, not T264777, sorry for any confusion that may have caused. I'll edit my other comment to avoid confusing other readers, and I've edited it in the quote above in this comment. But that was my mistake.

Rather than using the actual user preference (from the database), we could take it from Accept-Language, and Vary on that. As discussed in T269492: Selecting user language in the REST API, Accept-Language is problematic to Vary on, because it can cause cache fragmentation, but it might be possible to mitigate that by normalizing it at the edge cache level. However, that discussion is still ongoing and I'm not comfortable assuming where it will end up. In particular, I've had an initial conversation with Traffic, who has agreed to review the task and contribute any thoughts they may have.

I'm not sure about the caching implications or the ongoing discussion, but I thought for now the user's preferred language could be treated the same as every other language? That is, instead of passing a special value to make it use the preference, you could just retrieve the preference on the client side and put it in Accept-Language (or whatever method we want to use). The server wouldn't even know that the value comes from the preferences, and could handle it like it would handle any language specified in Accept-Language.

Yep. The only part of that I disagree with is "for now", because we don't yet have a solid decision on what mechanism we want to use on an API-wide basis.

If it were the individual handler pulling that preference from Accept-Language (or wherever), that'd be fine. But because we're talking about the REST infrastructure code needing to do this in EntryPoint, all endpoints would potentially be affected. I don't want to introduce Accept-Language (or any other API-wide mechanism) at that low of a level until the discussion in T269492: Selecting user language in the REST API is completed.

Ah yes, completing T269492 is a prerequisite for my proposal, which would only work once it's possible to specify the language (via Accept-Language or something else).

Unless someone has other suggestions, I think our options at the moment are:

  1. proceed with the proposal above, with the understanding that there may not be a translation for the user preference language
  2. wait for T269492: Selecting user language in the REST API to wrap up before we do anything

Sounds good. I think we may be fine with waiting for T269492 to be resolved, but that's just my personal opinion.

Sounds good. I think we may be fine with waiting for T269492 to be resolved, but that's just my personal opinion.

Okay. Let me know if it turns out to be otherwise.