Page MenuHomePhabricator

Sort out how page/rendering language related properties are used
Open, Needs TriagePublic

Description

There are currently four methods to get the language of some content displayed on a wikipage:

  • ContentHandler::getPageLanguage( Title, Content ) - documented as the language "in which the content of the given page is written", whatever that means. Can be customized via subclassing or the PageContentLanguage hook.
  • ContentHandler::getPageViewLanguage( Title, Content ) - documented as the language "in which the content of this page is written when viewed by user". In practice the difference seems to be variant support.
  • Title::getPageLangage() - wraps ContentHandler::getPageLanguage but can also deal with special pages, and depending on wiki configuration might allow a per-page override, stored in the DB.
  • Title::getPageViewLangage() - again, same as the corresponding getPageLanguage method, but with variant support.
  • ParserOptions::getTargetLanguage() - can be set manually, normally null.
  • Parser::getTargetLanguage() - basically the same as `Title::getPageLangage with some added handling for situations when there is no title (such as interface messages); can be overridden via ParserOptions::getTargetLanguage.
  • there are also the $wgLang and $wgContLang globals.

It's not at all clear which of these should be used when, and what little documentation there is doesn't really help. Parser::getTargetLanguage is the most complete, but there isn't any way to access it in most situations, so code that renders a piece of content can't tell what language it was rendered in to use matching wrapper text. It's not clear when page language vs. pageview language should be used. The whole system is on the page level, even though post-MCR it should probably be on the slot level. There is no way to mark content as multilingual or not having a language at all. And most code just uses the content language, which is the worst choice as it completely ignores the possibility of a wiki having content in more than one languages. So it all could use some rethinking and sorting out.

In current code, these seem to be the situations where Title::getPageLanguage does not simply return the content language:

  • $wgPageLanguageUseDB overrides: in whatever language was manually set in the DB
  • {{PAGELANGUAGE}} overrides (from the PageLanguage extension): in whatever language was manually set
  • CSS/JS pages: always in English
  • language subpages in the MediaWiki namespace: in the selected language
  • language subpages managed via Translate: in the selected language
  • WikimediaIncubator pages: in the language of that sub-wiki (I think? the code is hard to follow)
  • pages using the Semantic Interlanguage Links parser function {{interlanguagelink}}: in the selected language
  • LiquidThreads pages: in content or interface language, depending on whether they are deemed more content-like or interface-like (e.g. history view)
  • WikiLexicalData pages: in the interface language
  • special pages: in interface language

The page view language is usually the same, just in the user-selected variant (when that is a thing), except for Wikibase where it's the interface language.

Related tasks:

Event Timeline

From @Anomie in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/434544/28/includes/Revision/RevisionRenderer.php#227 :

  • The distinction between the target language and the interface language does still make sense. Special pages should still use the interface language, as should the skin chrome. Wikibase is weird, as are Commons file pages that make heavy use of {{int:}} to be pseudo-translated.
  • The "content language" should just be the default page/target language. Most code that currently uses $wgContLang should probably be using the target language instead.
  • ParserOptions should always have a target language set. To make that happen correctly, ParserOptions constructors would need to be passed a Title.[1] Then it can default to the page language[2] if one is set, falling back to the content language.
  • ParserOptions should probably split the cache on the target language when the language is used, like it already does for the interface language.

[1]: This should replace Parser::parse() taking a Title of its own.
[2]: Not the page-view language, ideally. Variant transformations for the page-view language should be done post-parse so as to not split the cache.