Page MenuHomePhabricator

Strip marker form inconsistency between PHP & Lua?
Closed, InvalidPublic

Description

I don't know since when it is (but it is quite recent) and I didn't find any relevant documentation for it, so I assume it is a bug.

https://git.wikimedia.org/blob/mediawiki%2Fcore.git/master/includes%2Fparser%2FParser.php (lines 135, 136, 953) defines strip marker in form of

\127UNIQ--tagname-8_hex_digits-QINU\127

and so was the pattern usable in Scribunto modules.

However, Scribunto modules since some recent time receive different form:

\127'"`UNIQ--tagname-8_hex_digits-QINU`"'\127

(mind the triple quotes wrapping)

Modules which used to work counting on the former pattern no longer do what they are expected to do unless the pattern is changed to the latter.

If this is intended behavior, could somebody navigate me to relevant documentation and/or discussion and/or patches, please? Thanks.

Event Timeline

Erm, strip markers are internal implementation detail that should really never be exposed to something like lua script processing. Can you clarify when they're coming up and what you're doing with them? If you ever see them, that's probably a bug that needs fixing...

The format of strip markers was recently changed to fix a security issue with strip markers (T110143, currently private). You can see some recent public communication about it on enwiki, particularly this diff and this diff.

Scribunto currently doesn't expose any method for matching strip markers in text, only for removing them (mw.text.killMarkers) or in certain cases extracting the hidden text (mw.text.unstripNoWiki). It would be possible to expose a pattern to match markers generically, if such a thing would be useful; file a new feature-request task with use cases if you want to follow up on that.

In T133477#2234456, @brion wrote:

Erm, strip markers are internal implementation detail that should really never be exposed to something like lua script processing. Can you clarify when they're coming up and what you're doing with them? If you ever see them, that's probably a bug that needs fixing...

Whenever something does wikitext that comes down to {{#invoke:Module|func|<nowiki>...</nowiki>}}, or {{#invoke:Module|func|Something<ref>...</ref>}}, or the like, the wikitext of the parameter passed into Scribunto will include the strip marker. The same sort of thing can happen if you internally do something like local text = frame:preprocess( '<ref>...</ref>' ), the returned wikitext will have strip markers in it.

There's no way to avoid this unless you want to completely strip out the references and so on (which would break all infoboxes, among other stuff), or you want to cause all sorts of other double-parsing bugs, or you want to rather pointlessly replace MediaWiki's strip markers with some other sort of strip markers when passing wikitext into and out of Scribunto.

Whenever something does wikitext that comes down to {{#invoke:Module|func|<nowiki>...</nowiki>}}, or {{#invoke:Module|func|Something<ref>...</ref>}}, or the like, the wikitext of the parameter passed into Scribunto will include the strip marker. The same sort of thing can happen if you internally do something like local text = frame:preprocess( '<ref>...</ref>' ), the returned wikitext will have strip markers in it.

There's no way to avoid this unless you want to completely strip out the references and so on (which would break all infoboxes, among other stuff), or you want to cause all sorts of other double-parsing bugs, or you want to rather pointlessly replace MediaWiki's strip markers with some other sort of strip markers when passing wikitext into and out of Scrbunto

Strip markers in text that gets regexed etc is a bad API for MediaWiki internals and a worse one to expose to user-supplied code, but I guess fixing it is not going to be trivial. I'll think about opening a long-term epic bug for retooling this.

Note that someday the old internal parser will probably be removed; we should have a nice structured format to replace it. Parsoid provides an annotated html model, for instance, which would allow us to pass in a Dom model instead of a big string.

The format of strip markers was recently changed to fix a security issue with strip markers (T110143, currently private). You can see some recent public communication about it on enwiki, particularly this diff and this diff.

I am confused - the source code linked in description does not suggest the new format or am I just blind?

The patch changing to the new format isn't public yet. This is our standard procedure for security bugs that affect released versions, to avoid leaking too much information about the vulnerability until a security release is made.

I am confused again - while the patch and new source are not visible yet, the new version is already running on WMF servers. So how come sources are behind the production? That doesn't seem to me like a standard procedure no matter if it is or is not security issue...

Anyway - that "backspace-wrapped-triple-quoted" format is now the one which is going to be used? (I don't want to re-code things several times and I want to update the documentation too...)

Thanks.

@Danny_B, here's the process we use for fixing security bugs,

https://www.mediawiki.org/wiki/Reporting_security_bugs#What_happens_when_I_report_a_bug

hopefully Anomie's comments will make more sense in light of that.

Yes, the version including '"` characters will be used going forward.

@csteipp Thanks for the explanation. For me the most important thing ATM is to know for sure the stable form I can work with.

I assume I should update the documentation after the patch will be published?

Od1n reopened this task as Open.EditedMay 8 2016, 4:04 PM
Od1n triaged this task as High priority.
Od1n subscribed.

At least on fr.wikipedia.org, this new "triple-quoted" format is used, but mw.text.unstripNoWiki() appears to be noneffective, it doesn't replace the strip marker. Argument passed to the module is simply <nowiki> </nowiki>. I even tried manually removing the quotes before passing to unstripNoWiki(), no more success. Note that killMarkers() is effective though.

This reminds me - is the patch already published? So I can update the documentation...

This reminds me - is the patch already published? So I can update the documentation...

Not yet. It should be after the next security release.

Anomie lowered the priority of this task from High to Medium.

At least on fr.wikipedia.org, this new "triple-quoted" format is used, but mw.text.unstripNoWiki() appears to be noneffective, it doesn't replace the strip marker.

Works fine when I try it. Even if it didn't, you should have opened a new bug instead of reopening this one for an issue that is at best tangentially related.

Argument passed to the module is simply <nowiki> </nowiki>. I even tried manually removing the quotes before passing to unstripNoWiki(), no more success.

I assume you're referring to https://fr.wikipedia.org/wiki/Module:Utilisateur:Od1n/Debug? Your problem there seems to be that you're incorrectly assuming the output of unstripNoWiki() will be <nowiki> </nowiki>, rather than just .

If I place {{#invoke:Utilisateur:Od1n/Debug|main|sép=<nowiki>foo</nowiki>}} on a page and preview, it outputs @foo@ (length 34) false @foo@ (length 3) false @UNIQ--nowiki-00000001-QINU@ (length 28) false as expected: the first bit is outputting the strip marker unchanged, the second is the correct operation of unstripNoWiki(), and the third is the correct behavior when you mangle the strip marker to remove the quotes.

Thanks for your reply,

Still, I'm not getting the desired "true" debug output. Is there a bug for real or am I missing something?

On a related note, https://fr.wikipedia.org/wiki/Sp%C3%A9cial:ExpansionDesMod%C3%A8les and "edit page > preview" give different lengths, but that's certainly an expected thing. Just to point this out.

Interestingly enough, I get a "true" with:

{{#invoke:Utilisateur:Od1n/Debug|main|sép=<nowiki> </nowiki>}}

But not with:

{{#invoke:Utilisateur:Od1n/Debug|main|sép=<nowiki>foo</nowiki>}}

Still, I'm not getting the desired "true" debug output. Is there a bug for real or am I missing something?

No bug here. You're missing that unstripNoWiki() does not include the <nowiki> tags in its output, just the content of the <nowiki> tags.

On a related note, https://fr.wikipedia.org/wiki/Sp%C3%A9cial:ExpansionDesMod%C3%A8les and "edit page > preview" give different lengths, but that's certainly an expected thing. Just to point this out.

Special:ExpandTemplates does some weird stuff and can result in double-parsing.

Interestingly enough, I get a "true" with:

{{#invoke:Utilisateur:Od1n/Debug|main|sép=<nowiki> </nowiki>}}

But not with:

{{#invoke:Utilisateur:Od1n/Debug|main|sép=<nowiki>foo</nowiki>}}

I see you've now updated your test module to not expect the <nowiki> wrapper around the output. Obviously the second one won't return true when you're trying to see if 'foo' equals ' '.

<del>There is a bug. I only get "true" when manually stripping the quotes.</del>

No wonder, I'm getting tired with this code, just let me take a breath with this. It seems to work.

EDIT: Yep, it works. I'm really sorry for the spamming. Thank you for the support.

This reminds me - is the patch already published? So I can update the documentation...

Not yet. It should be after the next security release.

Is the date known yet? If not, can somebody please poke me when it is released?