Title::getPageLanguage() uses ContentHandler::getPageLanguage() and other special page logic to determine the effective page language.
Either PageRecord::getLanguage() (or PageStore) needs to do the same, or it needs to return null of the language is not set in the database, and leave it to the caller to determine the effective language.
Description
Details
Related Objects
Event Timeline
Change 677371 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):
[mediawiki/core@master] WIP: Make PageRecord::getLanguage() behave consistently with Title::getPageLanguage()
Change 677505 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):
[mediawiki/core@master] PageRecord: make language optional
Change 677371 abandoned by Daniel Kinzler:
[mediawiki/core@master] WIP: Make PageRecord::getLanguage() behave consistently with Title::getPageLanguage()
Reason:
For now, let's do Ic26f6f7690499b3dd87982e3822881fd473cfd68
The logic in Title::getPageLanguage() can't be ported to PageStoreRecord or PageStore without pulling in logic about ContentHandler into the new storage layer component. Worse, since ContentHandler::getPageLanguage can be (and is) overwritten by subclasses, we can't change its signature to no longer require a Title, which would mean we bind PageStore to title.
Some observations:
- core never writes anything into page_lang in the database. But some extensions do, but the column is entirely empty at least on enwiki.
- in core, the page language is the content language, except for two cases: in the MediaWiki namespace, the language is determined by a language code suffix on the title. And in the Special namespace, the page language is the user's UI language.
- Some extensions override ContentHandler::getPageLanguage
- Some extensions implement the PageContentLanguage hook
- Conceptually, the content language is a property of Content object
- In the contexts in which Title::getPageLanguage is currently used, a Content object tends to be available.
- Conceptually, a page's content language should not depend on the user who views it. It's the language the page was written in (which could be 'und' or 'mul' as well).
- The effective output language however may depend on user preferences or the current request (variants).
Proposal:
Introduce a PageContentLanguageLookup service, with a getPageContentLanguage( PageRecord $page ) method as a replacement for Title::getPageLanguagte.
None of the deployed extensions actually write or read page_lang field. The only wiki I managed to find that actually does have the field set is commons, with 136 pages where page_lang is not null - and all of them are main pages.
The feature is enabled via wgPageLanguageUseDB on Wikisource T175622 and on wikis using the translate extension T153209.
I guess that doesn't really mean much - even though the feature is not used a lot, it is used and we probably can't just remove it.
Introduce a PageContentLanguageLookup service, with a getPageContentLanguage( PageRecord $page ) method as a replacement for Title::getPageLanguagte.
I'm a bit worried about proliferation of one-method service objects. We now have ParserOutputAccess, you proposed PageContentAccess, now PageLanguageLookup. I don't quite know the answer to this concern though
Change 677505 merged by jenkins-bot:
[mediawiki/core@master] PageRecord: make language optional
I suppose moth methods can be implemented by the same class, and may even be in the same interface. Or we make PageContent an entity-style object that has getCurrentContent() and getPageLanguage() methods.