Page MenuHomePhabricator

Scary transcluding a Wikipedia page with inline math renders that inline math incorrectly
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

{{wikipedia::Detection limit}}

, as found here: https://www.limswiki.org/index.php/Detection_limit.

What happens?:

Pure transcluded inline math is bungled on the HTML side. When you look at the code using the browser inspect tool:

  1. The semantics code is removed from between the math tags and out of the p tags, and its placed into its own set of pre tags.
  2. The fallback image code is placed after both closing span tags and after the pre'd semantics tags.
  3. See the following image:

MathInline2.png (865×921 px, 97 KB)

What should have happened instead?:

Pure transcluded inline math code should:

  1. Place the semantics code between the math tags, not in a new set of pre tags.
  2. The fallback image code should be placed after the first span tag closes but before the second span tag closes.
  3. See the following image:

MathInline1.png (763×993 px, 110 KB)

What else is noteworthy about this problem?:

Inline math is working fine on our MediaWiki instance. This can be confirmed by looking at native content with math (e.g., https://www.limswiki.org/index.php/User:Shawndouglas/sandbox/sublevel6) and by changing from pure transclusion to raw transclusion (e.g., https://www.limswiki.org/index.php?title=Detection_limit&oldid=49362).

At this point, we don't know what could be causing purely transcluded inline math to be rendered as under "What happened?"

Software version (skip for WMF-hosted wikis like Wikipedia):

Us: https://www.limswiki.org/index.php/Special:Version

MW 1.36.1 and Math extension code that is compatible with that MW version (3.0.0). Again, local Math is working fine.

Other information (browser name/version, screenshots, etc.):

Happens in all browsers, including Chrome-based and Safari. Have included screenshots above. Other useful info:

WP page: https://en.wikipedia.org/wiki/Detection_limit
Our wiki's transclusion of that page: https://www.limswiki.org/index.php/Detection_limit

Event Timeline

Thank you for the excellent bug report. I wonder if this is specific to the math or if there is a general problem when transcluding div elements. From the math extension code the main difference is that inline math is a span whereas displaystyle math is a div.
https://github.com/wikimedia/mediawiki-extensions-Math/blob/b08776747133a2d49fcac56f338abce046d0bf2b/src/MathMathML.php#L467-L471

I changed the second equation in https://github.com/wikimedia/mediawiki-extensions-Math/blob/b08776747133a2d49fcac56f338abce046d0bf2b/src/MathMathML.php#L467-L471 to real display math, and it also does not work. So it is probably not about span vs div. For formula that do render, the math is wrapped into <dl><dd> tags. So really the question is what does dl do to the math?

I changed the second equation in https://github.com/wikimedia/mediawiki-extensions-Math/blob/b08776747133a2d49fcac56f338abce046d0bf2b/src/MathMathML.php#L467-L471 to real display math, and it also does not work. So it is probably not about span vs div. For formula that do render, the math is wrapped into <dl><dd> tags. So really the question is what does dl do to the math?

Great question. Hopefully someone with more knowledge can speak to that. Also, good theory about span vs. div. Again, I wonder how the pre tags are getting added around the semantics code.

WelpThatWorked subscribed.

It looks like the extra <pre> is being added when the text is run through doBlockLevels in internalParseHalfParsed (Parser.php:1710). The interwiki content is pulled in as HTML as opposed to local transclusions pulled in as wikitext, so prepending raw: (e.g. {{raw:wikipedia::Detection limit}}) to pull the interwiki content as wikitext causes the issue to disappear. I think the interwiki HTML is already fully parsed when pulled, so it shouldn't be put through parsing a second time.

Change 853462 had a related patch set uploaded (by WelpThatWorked; author: WelpThatWorked):

[mediawiki/core@master] Improve HTML armoring by swapping General strip for NoWiki strip

https://gerrit.wikimedia.org/r/853462

Hey, all. It's been a little over a year here. I admit to being not-so knowledgeable about the back-end workings of trying to get a fix live in the MediaWiki code. That said, I see "Patch-For-Review," with a URL. When I click that URL, I see a page on Wikimedia Code Review, which I admit looks super cluttered and confusing to my untrained eyes. I can't fully tell what the status of this is, to be honest. Is this potential fix still in the "Ready for Review" state? If so, what's the next step to verify this fix works? If it's not in review, how do I decipher what was done with this potential fix? I'm really just an end user trying to get our wiki fixed. We continue to have the need to transclude pages with math, that don't display correctly. A year later, and we'd REALLY love to get this fixed. Help?

I assume if I want to test this myself to see if it will work on LIMSwiki.org, I'd download those includes/parser/BlockLevelPass.php and includes/parser/Parser.php, back up our existing files, and replace them with these? Then do a shift-refresh in the browser?

I think the patch is ready for review. Then someone with +2 rights for MediaWiki needs to merge change. Then the change gets merged to master and will be included in the next release. However, if you feel brave, you can change the respective lines in your instance and check if that does it solve your issue? Be sure that you know how to undo your chances in case anything goes wrong or try on a test install first, if you want to avoid downtime.

I think the patch is ready for review. Then someone with +2 rights for MediaWiki needs to merge change. Then the change gets merged to master and will be included in the next release. However, if you feel brave, you can change the respective lines in your instance and check if that does it solve your issue? Be sure that you know how to undo your chances in case anything goes wrong or try on a test install first, if you want to avoid downtime.

Went ahead and backed up BlockLevelPass.php and Parser.php, and then FTPed the new files. Ended up with an error trying to load a page: "Error: Class 'MediaWiki\MainConfigNames' not found". I re-uploaded the original files and the page loaded again without issue. So at least for me, simply replacing these two files results in an error.

Ok, then your version is probably too old or to new for the patch to work:-(

I wish this all weren't so obtuse. We're running MW 1.36.1. When I go to https://gerrit.wikimedia.org/r/c/mediawiki/core/+/853462, it's not obvious what version(s) these changes would support. They were made over a year ago, that much I can tell, but I can't tell what code version @WelpThatWorked modified back in November 2022 to get it to work. Supremely frustrating.

1.36 is unsupported according to https://www.mediawiki.org/wiki/Version_lifecycle/en you can upgrade to 1.39 that is an LTS version, so upgrading is recommended independently of this issue. After you upgraded, you can could try to modify the two files. My suggestion would be to just change the lines that were changed instead of copying the files, so a minimal change. Note that changing individual files or lines is nothing officially recommended, and should only be done if you feel confident at your own risk.

Understood, and thanks. I tried just updating those lines in 1.36.1, but it had no effect. I'll consult stakeholders to see if upgrading to 1.39 can get approved, and if so, retest with the complete file change.

Have a good holiday, if you celebrate. I'll report back with any updates.

I tested the changes on top of the latest MW core and they seem to still be effective/necessary. However, I notice the issue is absent on the limswiki page, unless you've already done the splice I'm not sure why that is.

@WelpThatWorked I think your patch needs to be rebased and get code review. Is there a strategy for that?

Physikerwelt moved this task from Inbox to Blocked: needs help on the Math board.
Physikerwelt added subscribers: Bmueller, Esanders.

I am adding MediaWiki-Engineering after a discussion with @Bmueller hoping that this helps to get the patch in the "review queue".

Checklist from https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews#No_timely_feedback

  • You can check the latest activity in a code repository via git log in your local checkout. To take over an abandoned project and become its maintainer, follow these steps.

mw core is maintained

  • If you think that your patch has been gone unnoticed for a longer time, feel free to bring up the problem on the #wikimedia-tech connect IRC channel.

I don't feel comfortable to disturb people on IRC.

From https://www.mediawiki.org/wiki/MediaWiki_Product_Insights/Contributor_retention_and_growth#MediaWiki_core I learned that @Esanders 's team is the steward for bugs with the math extension. According to the investigation of the ticket, the underlying problem is not limited to math, but a bug in core.

I personally find scary transcluding too scary and would also assume that WMF does not officially support this feature. However, the patch to be reviewed has nothing to do with either Math nor scary transclusion, but claims to fix a bug with HTML tag output. In addition, I assume the patch does solve the problem of @Lostraven, but I would guess the patch will never be merged in the main product line. My private opinion is that we should move away from strip markers, and not add more complex processing rules for strip-markers. At least from the math perspective our goal is to get rid of them until the end of 2024, as they caused a lot of trouble and regressions in the past.

On the meta-level I understand the challanges on both sides

Hey, all. It's been a little over a year here. I admit to being not-so knowledgeable about the back-end workings of trying to get a fix live in the MediaWiki code. That said, I see "Patch-For-Review," with a URL. When I click that URL, I see a page on Wikimedia Code Review, which I admit looks super cluttered and confusing to my untrained eyes. I can't fully tell what the status of this is, to be honest. Is this potential fix still in the "Ready for Review" state? If so, what's the next step to verify this fix works? If it's not in review, how do I decipher what was done with this potential fix? I'm really just an end user trying to get our wiki fixed. We continue to have the need to transclude pages with math, that don't display correctly. A year later, and we'd REALLY love to get this fixed. Help?

I assume if I want to test this myself to see if it will work on LIMSwiki.org, I'd download those includes/parser/BlockLevelPass.php and includes/parser/Parser.php, back up our existing files, and replace them with these? Then do a shift-refresh in the browser?

and on the other hand a comparable small developer team within WMF and WMDE that has two different tasks (keep the software used for Wikipedia up to date and maintain the open source product MediaWiki that is used for a large variety of applications) needs to prioritize to cope with too few developers.

However, if I had a free wish I'd ask for a better communication from WMF (maybe in collaboration with the mwstake) if a issue will be processed or not.

I tested the changes on top of the latest MW core and they seem to still be effective/necessary. However, I notice the issue is absent on the limswiki page, unless you've already done the splice I'm not sure why that is.

I admit to being a little confused here. The issue is not absent on LIMSwiki, and in fact the pages I gave as examples in the original bug report still produce the stated error. I could never get the patch to work; see my comments prior from December 19.

I personally find scary transcluding too scary and would also assume that WMF does not officially support this feature. However, the patch to be reviewed has nothing to do with either Math nor scary transclusion, but claims to fix a bug with HTML tag output. In addition, I assume the patch does solve the problem of @Lostraven, but I would guess the patch will never be merged in the main product line. My private opinion is that we should move away from strip markers, and not add more complex processing rules for strip-markers. At least from the math perspective our goal is to get rid of them until the end of 2024, as they caused a lot of trouble and regressions in the past.

I wish I understood a bit more about all this. Yes, as a long-time MediaWiki user, I understand Scary Transclusion is called "Scary" for a reason, even if I don't fully understand the programming reasons for why it's scary. I'm also under the vague impression that scary transclusion is perhaps not used by many MediaWiki users, making this a fix of limited interest.

That being said, as an end user, all I know is that, as stated in my "What should have happened instead?", if I inspect the browser code in Chrome and modify it a certain way in the inspector, I can get the math to render properly. I just need a way to "fix" the code that's causing the HTML to render as such.

My client is onboard with upgrading from 1.36.1 to 1.39 if the proposed files will actually fix the problem for us. But will it? And is the proposed fix the proper fix, particularly if it won't ever make it to MW core?

There is a pretty thorough description of "scary transclusion" at https://www.mediawiki.org/wiki/Manual:$wgEnableScaryTranscluding. The primary reason it is "scary" is because cross-wiki page invalidation does not occur: changes to the transcluded page on the "other" wiki are not reflected on the local wiki, unless a variety of somewhat unpredictable things occur (an edit on the local wiki, expiration from the local parser cache, an edit to *some other* template which is used by the page on the local wiki, etc). The various modes (raw, pure, subst, other) are also described on that page. It appears that the "raw" solution suggested above is actually the Right Thing To Do in this situation:

The interwiki content is pulled in as HTML as opposed to local transclusions pulled in as wikitext, so prepending raw: (e.g. {{raw:wikipedia::Detection limit}}) to pull the interwiki content as wikitext causes the issue to disappear.

So I'd be inclined to close this as working-as-designed aka wont-fix: if you want "scary" transclusion *as wikitext*, prepend "raw". I think we can have a longer discussion about how transcluding arbitrary HTML from an external site "should" work, but there's probably not one universally correct answer to that, and the existing documentation already describes some known bugs, like how links to the external wiki will be rendered as "blue links" not as "external links" as they should. All the CSS applied will be that of the "local wiki" and not the "foreign wiki" as well, and since the transcluded HTML on the foreign wiki can include javascript and style modules, those will be missing in the transcluded HTML as well. "Just treating the HTML as stripped content" doesn't address any of these fundamental issues. There's a reason this is called scary.

So I'd be inclined to close this as working-as-designed aka wont-fix: if you want "scary" transclusion *as wikitext*, prepend "raw". I think we can have a longer discussion about how transcluding arbitrary HTML from an external site "should" work, but there's probably not one universally correct answer to that, and the existing documentation already describes some known bugs, like how links to the external wiki will be rendered as "blue links" not as "external links" as they should. All the CSS applied will be that of the "local wiki" and not the "foreign wiki" as well, and since the transcluded HTML on the foreign wiki can include javascript and style modules, those will be missing in the transcluded HTML as well. "Just treating the HTML as stripped content" doesn't address any of these fundamental issues. There's a reason this is called scary.

Transcluded page of Detection limit on LIMSwiki: https://www.limswiki.org/index.php/Detection_limit
It works almost brilliantly, with the tiny exception of that math bit.

Transcluded page of Detection limit on LIMSwiki, with raw: https://www.limswiki.org/index.php/User:Shawndouglas/sandbox/sublevel5
Sure, we fix the one broken thing, but then it's complete devastation. This is not a "solution" for us. We suddenly have to create transclusions or our own articles for tens of red-linked pages, add multiple templates we absolutely don't need, end up with categories that don't match our categories... it's a disaster.

Sounds like we're arguing about whether or not there really is a bug to fix. If that's the case, fine, whatever. After more than 15 years of making edits on Wikipedia and almost as many implementing MediaWiki software for clients, I'm soooo used to the opaque layer of idiosyncrasies of trying to gain solid agreement about much of anything regarding the codebase. But that's just jaded and half-defeated me, so just punt that aside.

That said "just use raw" is not a solution for us. We're sooooo close to it working exactly as we want, with the exception of some math (not all). I'm still working with my client to clone the site and allow me to upgrade to 1.39 and see if WelpThatWorked's proposed fix actually solves our problem. If it does, we'll probably just implement the fix and avoid upgrading in the foreseeable future, for fear the fix gets broken with a future update.

Whether or not something is broken and needs fixing ... well, I'll leave it to you all to battle that out.

The patch proposed above in T318060#8371827 by @WelpThatWorked to use a nowiki stripmarker instead of a general one seems to match the approach that the <html> tag extension does when $wgRawHtml is enabled,
https://github.com/wikimedia/mediawiki/blob/master/includes/parser/CoreTagHooks.php#L91-L127
so there's precedent, though I would suggest backing out the changes to the BlockLevelPass.php

In T279094, Parsoid is looking to add support for parser functions that return isHTML and the analysis from @ssastry in T279094#8740268 seems to want to shuttle the output in the same way that extension content would be. However, at present, as we've encountered above, the output is subject to the doBlockLevel pass and changing the behaviour might affect other parser functions that set isHTML, of which there are many,
https://codesearch.wmcloud.org/extensions/?q=%27isHTML%27+%3D%3E+true&files=&excludeFiles=&repos=
The proposed change seems like it would simplify things by not having to think about that further processing.

Although, Parsoid's treatment of extension tags isn't entirely consistent with the legacy parser as it is, since it always treats the tag content as if nowiki strip marker wrapped, whereas extensions that want that behaviour were forced to do it explicitly in the legacy parser,

// Use 'nowiki' strip marker to prevent list processing (also known as doBlockLevels()).
// However, leave the wrapping <div/> outside to prevent <p/>-wrapping.

from https://github.com/wikimedia/mediawiki-extensions-SyntaxHighlight_GeSHi/blob/master/includes/SyntaxHighlight.php#L466-L471

Note that the proposed changed can be applied incrementally by only patching interwiki transclusions that set isHTML, rather than all parser functions.

Note that the proposed changed can be applied incrementally by only patching interwiki transclusions that set isHTML, rather than all parser functions.

Yeah, I think that would probably be the "proper" semantics -- to treat "pure" scary transclusion just like we treat other transclusions (extensions etc) that return HTML rather than wikitext, which "raw" scary transclusion is treated just like other transclusions which return wikitext. There are still some fundamental questions here regarding interactions with the Sanitizer -- strip state bypasses the Sanitizer entirely I think, which would add an additional level of "scary" to the transclusion.

But this all comes with the big caveats (a) this code will probably never be tested in WMF production and I'm not sure it even has tests in CI, and (b) even with that fix, it's not clear that <math> is going to work right, since the JS and CSS modules that the math extension requires are going to be missing. Transclusion as wikitext should solve (b) at least.

(Veering off-topic: the basic types returned from transclusions currently are "html", and "wikitext", but with some oddball apparently-unintentional corner cases like "raw string", "strip state marker", and "partially-parsed wikitext". The interaction of the Sanitizer with these types is also a bit vague. Ideally these types would match up with the types accepted as *arguments* to transclusions, but they do not -- yet. I'd like to fix that eventually so that transclusions can be reliably composed; https://en.wikipedia.org/wiki/User:Cscott/Ideas/Some_thoughts_on_Global_Templates,_so_called#Types touches on this a bit.)

Change 990253 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Armor scary transclusions from doBlockLevels

https://gerrit.wikimedia.org/r/990253