Page MenuHomePhabricator

Possible parsercache corruption in Charts localization
Closed, ResolvedPublic

Description

As pointed out by @cscott on a mastodon post:

@bvibber hm, be careful.  The target language is 'zh' because it is going to be language converted.  The *user* language is zh-whatever, because messages are pre-localized and don't go through language converter.  You risk corrupting the parser cache if you mix those two together. 

I think you should be using getUserLanguage() which will also tag the parser output as being dependent on user language.

Fix here should be to call getUserLanguage() even though we actually want the target language, because we need to mark the page as being dependent on the user language's variant selection. This probably should only happen if the target language and user language base languages match, to avoid the extra cache fragmentation on non-variant content languages.

Worst case if we don't fix this is parsercache corruption saving/displaying the wrong variant on Chinese, Serbian etc.

Event Timeline

Because the legacy parser is pre-cache, we might already be splitting the parser cache by user variant, but parsoid expects to do language conversion post cache. It is more technically correct to use user language here because user language strings are also pre-converted, excluded from language converter, etc. That said, your content might be hidden behind a strip marker and hidden from language converter anyway.

Probably just looking hard at the parser cache key emitting in the limit report comment will tell you if the cache key is varying properly with user variant.

Change #1159536 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/Chart@master] WIP Robustness for cache on Charts in languages with variants

https://gerrit.wikimedia.org/r/1159536

@CCiufo-WMF quick question: I can either make this match the previous behavior ("use page target language" with an exception for variants set by the user language), OR I can just make us *always* use the user language.

This would mean that ?uselang=foo would always translate the chart into the given language if the strings are present, and it might be more what folks want but we never made a firm decision on this. :D Cost is that every page with a chart on it gets bifurcated in the cache on user language (more space, more rendering time) even on English etc, as opposed to just on zh, sr, etc.

My 2 cents: I see charts as content, so I think they should be in the language that matches the content around them, not the language the UI is in.

*nod* i'll move ahead for now with target language + variants for now, seems to work. :D

My 2 cents: I see charts as content, so I think they should be in the language that matches the content around them, not the language the UI is in.

Yeah I agree. I think it would be strange if charts were the only part of the content changing with the user interface lang. It should be as consistent with the content language as possible.

Change #1159536 merged by jenkins-bot:

[mediawiki/extensions/Chart@master] Robustness for cache on Charts in languages with variants

https://gerrit.wikimedia.org/r/1159536

Went out with 1.45.0-wmf-.7 which is now the oldest version in production; closing as resolved