Page MenuHomePhabricator

Phase out Title::getPageViewLanguage in favour of ParserOutput metadata
Closed, ResolvedPublic

Description

Background

See T341244: ParserOptions and Title::getPageViewLanguage may disagree on the lang/dir. In a nut shell, it is (unfortunately) common today for extensions to try to "guess" the language and directionality of embedded content. The reason for this is largely that a better API didn't exist, and classes like mw-content-ltr are important to set for correct styling, and are difficult to set by any means other than Title::getPageViewLanguage. But, Title::getPageViewLanguage involves global state and a lot of guess work.

Scope

https://codesearch.wmcloud.org/deployed/?q=%3EgetPageViewLanguage

Title::getPageViewLanguage

ContentHandler::getPageViewLanguage

Outcome

Remove use of Title::getPageViewLanguage.

Example migration:

  • HTML workarounds: In cases where it is used to manually create CSS classes like mw-content-ltr and mw-parser-output, and the lang and dir attributes, this can probably removed given that nowadays ParserOutput will take care of setting these already. Be sure to test the end-user workflow and confirm that your workarounds are redundant before removing them. When not possible to remove or if not feasible in the short-term, at least use ParserOutput->getLanguage() to inform your current workarounds instead of the global guesswork from Title::getPageViewLanguage.
  • Language code. If using Title::getPageViewLanguage to associate page language outside a pageview perspective, it is likely incorrect. Consider using Title->getPageLanguage() instead.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 973276 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] SpecialDiscussionToolsDebug: Replace Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/973276

Change 973278 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/TemplateSandbox@master] Remove no-longer-needed uses of Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/973278

Change 973278 merged by jenkins-bot:

[mediawiki/extensions/TemplateSandbox@master] Remove no-longer-needed uses of Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/973278

Krinkle triaged this task as High priority.Nov 13 2023, 5:43 PM

Change 974269 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Scribunto@master] ScribuntoContentHandler: Refactor highlight() to reduce dependencies

https://gerrit.wikimedia.org/r/974269

Change 974276 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Scribunto@master] ScribuntoContentHandler: Remove redundant Title::getPageViewLanguage use

https://gerrit.wikimedia.org/r/974276

Change 973276 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] SpecialDiscussionToolsDebug: Replace Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/973276

Change 976877 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/EventLogging@master] ContextAttributesFactory: Use Title->getPageLanguage() to avoid global state

https://gerrit.wikimedia.org/r/976877

Change 974269 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] ScribuntoContentHandler: Refactor fillParserOutput()

https://gerrit.wikimedia.org/r/974269

Change 976877 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] ContextAttributesFactory: Use Title->getPageLanguage() to avoid global state

https://gerrit.wikimedia.org/r/976877

Change 974276 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] ScribuntoContentHandler: Remove redundant Title::getPageViewLanguage use

https://gerrit.wikimedia.org/r/974276

Change 982176 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/Flow@master] Replace Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/982176

Change 982176 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Replace Title::getPageViewLanguage()

https://gerrit.wikimedia.org/r/982176

  • Language code. If using Title::getPageViewLanguage to associate page language outside a pageview perspective, it is likely incorrect. Consider using Title->getPageLanguage() instead.

I'm not sure this is correct. I've just run into this in T353199: prop=description does not respect language variants properly which I think is caused by using Title::getPageLanguage() where Title::getPageViewLanguage() should have been used. Architecturally that method is terrible - it pretends that the language is derived from the title, when in reality it can depend on any number of things including site configuration, page configuration, user preferences, URL parameters or request headers (although to a lesser extent this is true of Title::getPageLanguage() as well) but functionally it's often going to be more correct.

(I guess it depends on your definition of "pageview perspective", but if you include anything that's connected to a web request then you are left with very little that can actually be deprecated.)

I think the more correct approach would be some kind of service that takes a title and a request context (or user identity + WebRequest) and looks up the language, so that the dependencies are made explicit.

Krinkle removed a project: User-xSavitar.
Krinkle moved this task from Inbox, needs triage to Soon on the MediaWiki-Platform-Team board.
Krinkle added a subscriber: xSavitar.

Change 982768 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/Wikibase@master] Content: Use `getPageLanguage()` not `getPageViewLanguage()`

https://gerrit.wikimedia.org/r/982768

I think the more correct approach would be some kind of service that takes a title and a request context (or user identity + WebRequest) and looks up the language, so that the dependencies are made explicit.

I think that's basically LanguageConverter::getPreferredVariant(), and see T346996 about making it a service instead of reading from the globals.

Change 982768 abandoned by D3r1ck01:

[mediawiki/extensions/Wikibase@master] Content: Use `getPageLanguage()` not `getPageViewLanguage()`

Reason:

https://gerrit.wikimedia.org/r/982768

As discovered above, Wikibase (and CommonsMetadata too) actually use the ContentHandler::getPageViewLanguage rather than Title::getPageViewLanguage. I'm not sure if it was intended to be in scope for this task? It's similarly bad and indirectly reads from global state in the same way.

As discovered above, Wikibase (and CommonsMetadata too) actually use the ContentHandler::getPageViewLanguage rather than Title::getPageViewLanguage. I'm not sure if it was intended to be in scope for this task? It's similarly bad and indirectly reads from global state in the same way.

I can confirm that CommonsMetadata and Wikibase are not in scope for this task. In fact, I did a patch first for Wikibase and we ended up realizing that it's not in scope and I just verified CommonsMetadata as well. It calls getPageViewLanguage() on ContentHandler instead of Title: https://gerrit.wikimedia.org/g/mediawiki/extensions/CommonsMetadata/+/92d3152feb3148322c41aef1187842082b967a92/src/HookHandler.php#155

@Krinkle, what would be a good next logical step here? Maybe double check callers again (just in case), then if none found, soft and hard deprecate Title::getPageViewLanguage()?

Change 983714 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/CommonsMetadata@master] Replace getPageViewLanguage() with ParserOutput->getLanguage()

https://gerrit.wikimedia.org/r/983714

@DAlangi_WMF The problematic hook that initiated this task (via the parent task) is onPageContentLanguage() which is called by Title->getPageViewLanguage(), but not directly. The hook call resides in ContentHandler::getPageViewLanguage() -> ContentHandler->getPageViewLanguage().

The reason Wikibase is out of scope, is that it implements a custom method called getPageViewLanguage in a subclass of ContentHandler, and shares no code with the base class in core. After we deprecate and remove this, the Wikibase continues to work on its own.

CommonsMetadata, however, does call into the affected core code. It makes the same kind of mistake as the other extensions we patched in this task, namely it deals with a ParserOutput, but instead of asking the ParserOutput object what the language of the HTML is, it tries to "guess" by invoking global state of ContentHandler::getPageViewLanguage.

@DAlangi_WMF The problematic hook that initiated this task (via the parent task) is onPageContentLanguage() which is called by Title->getPageViewLanguage(), but not directly. The hook call resides in ContentHandler::getPageViewLanguage() -> ContentHandler->getPageViewLanguage().

The reason Wikibase is out of scope, is that it implements a custom method called getPageViewLanguage in a subclass of ContentHandler, and shares no code with the base class in core. After we deprecate and remove this, the Wikibase continues to work on its own.

CommonsMetadata, however, does call into the affected core code. It makes the same kind of mistake as the other extensions we patched in this task, namely it deals with a ParserOutput, but instead of asking the ParserOutput object what the language of the HTML is, it tries to "guess" by invoking global state of ContentHandler::getPageViewLanguage.

Ack! I'll review and test the patch you made. Thanks for clarifying. 🙏🏽

Change 983714 merged by jenkins-bot:

[mediawiki/extensions/CommonsMetadata@master] Replace getPageViewLanguage() with ParserOutput->getLanguage()

https://gerrit.wikimedia.org/r/983714

@Krinkle, is this good to be resolved now or should we wait till end of week?

With the last patch merged, feel free to close it.