Page MenuHomePhabricator

LanguageConverter should remember variant selection for anonymous users across different pages
Open, Needs TriagePublic

Description

Currently, if a user isn't logged in, even if they have explicitly selected a variant from the drop-down menu on one page, the preference is lost as soon as they click on a link to navigate to a different page, and it falls back to the default.. This is a pretty poor user experience, especially since e.g. in the case of zhwiki, the default is a mix of unconverted Traditional and Simplified characters if it can't be inferred from browser settings.

This also happens on MobileFrontend, which may need to be fixed separately.

Possible mitigations

  • Save the variant selection in a session or regular cookie
  • If user is anonymous, automatically modify the URL of all internal links (wiki links, sidebar, search, etc.) to include variant information (e.g. if I'm viewing https://zh.wikipedia.org/zh-tw/維基百科, all link targets should use be prefixed with https://zh.wikipedia.org/zh-tw/ rather than with https://zh.wikipedia.org/wiki/)
    • Note: Naively changing the parser output doesn't quite work, because the parser cache is shared between all viewers of a variant. If we just changed the parser output to always use variant-specific URLs, links shared externally (e.g. on social media) would always be variant specific, which hurts usability in cases where MediaWiki can actually infer a different preferred variant for the person clicking on the link.

Event Timeline

Change 742677 had a related patch set uploaded (by Wctaiwan; author: Wctaiwan):

[mediawiki/core@master] Preserve language variant in wikilinks

https://gerrit.wikimedia.org/r/742677

Some discussion wctaiwan and I had about this:

  • The parser cache is already split by variant, so we're not going to cause any pollution issues. I will double check that variants are cached in varnish to the same extent so it's still cached.
    • We do need to be prepared with a RejectParserCacheValue hook once we're in a state where it's ready to be merged in case a revert is needed.
  • Adding more global state to Title is bad, and putting this in LinkRenderer is better since it uses proper dependency injection and we have a better idea of the impact.
    • This also means it won't fix everything. Trying it out on patchdemo already shows it doesn't cover the sidebar, personal tools, and the logo. I think that's OK as a start.

I'm a monolingual English speaker so I'm not really sure what else to look out for. This seems like a pretty straightforward change with hopefully significant impact.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/742677/ doesn't quite work because the parser cache is shared for everyone (logged in and anonymous users alike) looking at the same variant. This means that e.g. a logged in user viewing an article would fill the cache with non-variant-specific links, and so an anonymous user who later visits the article would still have the same issue.

At the same time, we probably also don't want to just always use variant-specific URLs: This would mean that links shared externally (e.g. on social media) would always be variant specific, which hurts usability in cases where MediaWiki can actually infer a different preferred variant for the person clicking on the link.

Here are the high-level goals I think we want to accomplish:

  1. For logged in users with a preferred variant, use that.
  2. Failing that, if a preferred variant can be inferred from the Accept-Language header, use that.
  3. If no preferred variant can be inferred, the user should be able to manually select one from the drop down menu.
  4. If a variant has been manually selected, it should be preserved as the user navigates around the wiki (this task).
  5. Variant-agnostic URLs should be preferred over variant-specific ones where possible for better usability when sharing links.

Given the above, I think any solution we come up with cannot affect the parser output, since otherwise the parser cache can contain variant-specific or variant-agnostic URLs depending on who happened to fill the parser cache for the (article, variant), with results shared across all viewers.

Possible options:

  • Store preferred variant in a cookie when selected from the drop down menu and use it as an override in LanguageConverter::getPreferredVariant
  • Rewrite all wiki links with JavaScript to inject variant information if the current URL has an explicit variant

So we have two URL structures:

  • zh.wikipedia.org/wiki/: for logged out users, use canonical/accept-language variant, or for logged-in users, their preference
    • Varnish cache split on Accept-Language, logged-in users bypass varnish
  • zh.wikipedia.org/zh-$variant/: always use the specified variant.
    • no Varnish cache split, URL is unique

Is that right?

At the same time, we probably also don't want to just always use variant-specific URLs: This would mean that links shared externally (e.g. on social media) would always be variant specific, which hurts usability in cases where MediaWiki can actually infer a different preferred variant for the person clicking on the link.

This is basically the same issue that we have with the .m. mobile domains, because mobile users share URLs to the mobile domain, but desktop users don't get a redirect to the desktop site.

Given the above, I think any solution we come up with cannot affect the parser output, since otherwise the parser cache can contain variant-specific or variant-agnostic URLs depending on who happened to fill the parser cache for the (article, variant), with results shared across all viewers.

To me it seems like we want to split the parser cache on what the URL path was used to access the article. So if it was accessed with /wiki/, then it continues with neutral URLs, but if you access with /zh-$variant/ then you continue to get variant-specific URLs. I think we want to strongly avoid any more parser cache splits.

So we could also rewrite the URLs as a post-process step. (with Parsoid we just have to adjust the single <base> tag which is kind of trivial). We currently do this for action=render too, so there's some precedent, but it's not great.

Possible options:

  • Store preferred variant in a cookie when selected from the drop down menu and use it as an override in LanguageConverter::getPreferredVariant
  • Rewrite all wiki links with JavaScript to inject variant information if the current URL has an explicit variant

There's some precedent for both of these, see https://commons.wikimedia.org/wiki/MediaWiki:Gadget-AnonymousI18N.js. The problem with using a cookie AIUI is that we need to teach Varnish to parse the cookie and split the cache on that value. Or the cookie forces users to bypass Varnish entirely so they hit MediaWiki directly, which isn't that great either.

Another option is to focus specifically on the issue of *sharing URLs*. We could use history.replaceState() (https://developer.mozilla.org/en-US/docs/Web/API/History/replaceState) to rewrite the URL in the location bar from variant-specific to neutral. This wouldn't cover someone right-clicking on a URL and sharing that though.

Change 742677 abandoned by Wctaiwan:

[mediawiki/core@master] Preserve language variant in wikilinks

Reason:

This won't work because the parser cache is shared (see task for details)

https://gerrit.wikimedia.org/r/742677

So our options are probably:

  1. Add a cookie that can be used in lieu of Accept-Language, set on selecting a variant (requires changing Varnish to handle the cookie)
  2. Do some post-processing outside of the parser cache to rewrite links based on the URL variant (requires changing parse post-processing, might be hacky because at this point we're probably dealing with raw HTML rather than structured data)
  3. Rewrite the URLs using JavaScript based on the URL variant (hacky/brittle but doesn't require backend changes, and probably lowest risk)
  4. Split the parser cache on the URL variant as well (i.e. /wiki/Foo and /zh-*/Foo would have different cache keys, with the latter using variant-specific links throughout)

I wonder if #4 is less expensive than we think it is. For most users /wiki/ would likely work just fine since e.g. a browser using Chinese UI would probably send a reasonable Accept-Language header by default (I haven't verified though), so we'd expect most articles not to be frequently accessed/cached with /zh-*/ URLs. It'd help if we can get a sense of how often it is that people hit the fallback/no conversion variant.

Test wiki on Patch demo by Legoktm using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/d5cb291b81/w/

I put up a proof of concept for rewriting the links in JS at https://zh.wikipedia.org/wiki/User:Wctaiwan/rewritePathVariant.js and started a discussion at https://zh.wikipedia.org/wiki/Wikipedia:互助客栈/技术#在瀏覽過程中保留中文變體 for potentially adding something like this to Common.js on zhwiki. It wasn't complex as I'd feared, and anecdotally the performance seems acceptable.

I discussed the Varnish cookie approach with @BBlack on IRC (full logs).

The main takeaway is that there is nothing obviously wrong or flawed with the plan, but how Varnish/MW handles cookies is complex, and there has to be a good justification to add this proposed level of complexity to our stack.

Longer explanation: MediaWiki sets Vary: Cookie but that's deceptive, because Varnish really only varies on the session cookie. And even that is not exact, internally Varnish does stuff like rewriting the cookie to Token=1 so the "this request is uncachable" response is just saved once, and not per user. Varnish does so because it coalesces anonymous requests, so if multiple anons ask for /wiki/Foo, only one request is made to the backend for it. Except for users, you don't want to queue them into waiting, discover the request is uncachable, and then have the next user request, etc.

So it's possible to split the cache based on an anonymous cookie, but it adds more "business logic tied together between MW+Varnish" that AIUI we want to keep to a minimum.

Based on this, my recommendation would be that we move forward with the JS gadget, and use it to collect user feedback and basic usage stats (mw.track). And if that's successful, we can make a better case for implementing this properly with a cookie and corresponding Varnish logic.

@Krinkle, @Legoktm mentioned you'd worked on the AnonymousI18N gadget on Commons which I tried to imitate in https://zh.wikipedia.org/wiki/User:Wctaiwan/rewritePathVariant.js Would you mind reviewing the code to see if the approach makes sense to you? Thanks!