Page MenuHomePhabricator

Some wikitext features render incorrectly in l10n messages because they parse other messages themselves
Closed, ResolvedPublic

Description

It has long been known that you can't call $parser->parse() while within another $parser->parse() call, e.g. from a parser hook, because of shared internal parser state. This used to cause problems in MediaWiki all the time 10-ish years ago, until we just told everyone to use a fresh parser when they need to parse wikitext (currently this is done via ParserFactory::getInstance()).

However, this knowledge has not arrived in the MessageCache code. It only has one parser instance, and when you call $msg->parse() while within another $msg->parse(), it just returns HTML-escaped wikitext instead of parsing it (since 3a0ed7a04492 in 2011). This rarely happens, but it does happen, and causes extremely confusing bugs.

Event Timeline

Change #1064037 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] MessageCache: Support parsing another message while already parsing

https://gerrit.wikimedia.org/r/1064037

Is the intention to make mistakes look less obviously broken, and thus for them to be less urgent as bug reports and/or might work correctly in ways that would not have any obvious bug. Or is the expectation that this would generally work correctly as a reliable capability?

I ask because in my experience, when this "raw html" shows up, it tends to be due to pre-existing mistakes in the code. For example, if UI code is being triggered inside page content, that's likely to produce cache poisoning, use the "wrong" language target, or be missing parser output metadata. Or if some general utility function is called indirectly by a parser hook, and that uses wfMessage(). That is de-facto UI code as well.

The way this normally works is:

  • The message explicitly embeds other messages via {{int:}} which is handled in a single pass by the pre-processor and parses without awareness of individual messages.
  • The parser hook uses $parser->msg()->text() for "simple" messages.
  • In top-level context, e.g. special page or action UI, each message is parsed directly and added to the output.

It would be good for future reference to detail in a few words one or two examples where 1) alternatives are absent, complex, or unstable, and 2) would be able to use this feature in a "correct" way, and 3) is something we want to encourage and support.

I think supporting it is reasonable, but I can see it potentially masking a discovery mechanism for mistakes. Maybe it wouldn't, but an example would make that easier to verify.

I think this should generally work correctly. Currently there are some features that you can use in page content, but can't use in localisation messages (if you do, they appear as un-expanded wikitext or double-escaped HTML). I think that's very unexpected.

I've already added the examples that led me to discover this bug as parent tasks:

They seems like reasonable use cases to me, and would be difficult or impossible to solve in different ways.

However, thinking about it a bit more, maybe the support should be a bit more limited. My original patch would allow nesting messages within messages with no limit (only limited by other parser limits, like bracket expansion depth), but I can't think of any use case for more than maybe 2 levels of nesting. Unlimited nesting could lead to ugly failures like out-of-memory situations. There should probably be some limit, larger than 2 just in case other people are more imaginative, but still fairly low. I'll update the patch to show an error message after 5 levels.

Special page transclusion inside a localization message sounds pretty hairy to begin with! And I agree with your suggestion in T178694#10077696 to work around it.

And can we provide an alternative hook to HtmlPageLinkRendererEnd? I don't love what Wikifunctions is trying to do there, either. (It's also incompatible with Parsoid at the moment).

I'm mostly in favor of making this a more obvious error message (instead of "broken wikitext") rather than trying to make it work.

Change #1064037 merged by jenkins-bot:

[mediawiki/core@master] language: Support parsing in MessageCache while already parsing

https://gerrit.wikimedia.org/r/1064037