Page MenuHomePhabricator

Parser does not correctly recognise what's transcludable and what's not in complex case of includeonly - onlyinclude
Open, LowestPublic

Description

Quick description:

Parser does not understand that when <includeonly><onlyinclude>...</onlyinclude></includeonly> tags exist then wikitext-not-in-tags equals to wikitext inside a <noinclude>...</noinclude> tag and must be ignored (i.e. must not be re-rendered) to pages that this template is transcluded in. As a result of this, huge job queues are created when there is any change in wikitext-not-in-tags, when they shouldn't have.

Detailed explanation:

Recently I needed to assign Page Forms to templates to semantically handle template metadata including template documentation. As we know, template documentation usually goes inside a <noinclude>...</noinclude> tag and the actual template code-to-be-transcluded goes inside a <includeonly>...</includeonly> tag. That way, each time there is a change in template documentation does not affect all the pages that this template is transcluded in.

But, since Page Forms write their wikicode by default in the topmost part of page, there was no way to make a page form write its wikitext inside a <noinclude>...</noinclude> tag .

Then I realised that from a logic point of view
<noinclude>A</noinclude><includeonly>B</includeonly>
equals to
A<includeonly><onlyinclude>B</onlyinclude></includeonly> .

The first part of the equation says that A must not be transcluded. B is transcludable-only i.e. must not show in template page. So A shows in template page but it's not transcluded and B does not show in template page but it's transcluded to other pages.

The second part of the equation says that A shows on template page and can be transcluded. But <onlyinclude>B</onlyinclude> tells the template that can only transclude B so that makes A not transcludable eventually. Being all that inside a <includeonly>...</includeonly> tag means that B (which the only transcludable thing) is also a transclude-only thing i.e. it must not show in template page itself.

So both parts of the equation tell exactly the same thing.

I have already implemented this new idea in my templates and everything does work as described above.

But then I realised that every change I make to the "A" part of a template (i.e. the non transcludable part), a huge job queue is created since some templates are transcluded to tens of thousands of pages.

So I believe that parser does not recognise the above described logical equation and re-renders pages when it shouldn't have because there is no actual change in template when only the "A" part changes.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 8 2017, 2:46 PM

I forgot to mention that I am using MW 1.26.2 and cannot upgrade because Media Temple, my hosting provider, still uses and old PHP version. But I did a thorough search about this issue and I believe that still exists in latest MW.

Is there a public testcase for a supported version of MediaWiki available?
(As 1.26 is unsupported software with security issues.)

Aklapper triaged this task as Lowest priority.Feb 21 2017, 3:22 PM

I'm pretty sure that

A<includeonly><onlyinclude>B</onlyinclude></includeonly>

is equivalent to simply

A<onlyinclude>B</onlyinclude>

Does it work correctly when you try that? I don't think the various include-related tags are supposed to support being nested.

protonotarios added a comment.EditedFeb 21 2017, 9:32 PM

I'm pretty sure that

A<includeonly><onlyinclude>B</onlyinclude></includeonly>

is equivalent to simply

A<onlyinclude>B</onlyinclude>

No, it's not.
They are equivalent as to what is being transcluded (which is "B" in both cases) but they are not equivalent as to what is being shown in template page itself.
The first one shows "A" in template page.
The second one shows "AB" in template page.

As I said in my initial comment:

A<includeonly><onlyinclude>B</onlyinclude></includeonly>

is only equivalent to the classic syntax:

<noinclude>A</noinclude><includeonly>B</includeonly>

Does it work correctly when you try that? I don't think the various include-related tags are supposed to support being nested.

Yes, nesting works exactly as expected. I don't see why they are not supposed to support being nested.

They are equivalent as to what is being transcluded (which is "B" in both cases) but they are not equivalent as to what is being shown in template page itself.
The first one shows "A" in template page.
The second one shows "AB" in template page.

Ah, OK, I see.

Does it work correctly when you try that? I don't think the various include-related tags are supposed to support being nested.

Yes, nesting works exactly as expected. I don't see why they are not supposed to support being nested.

It totally doesn't. It kind of works for this specific case, but not in general. For example, <includeonly><includeonly>A</includeonly></includeonly> will not act like you'd expect. I'm not sure if it's documented somewhere, but this is just how it works.

This isn't parsed as XML, but rather by a custom parser (see /includes/parser/Preprocessor_Hash.php and /includes/parser/Preprocessor_DOM.php, these are two alternative implementations of the same algorithm) and it most definitely doesn't handle nesting of XML-ish tags.

I'm not saying that it couldn't handle this specific case, perhaps it could (especially if you submit a patch :) ). But currently we're not even aspiring to handle it (it only kind of works because <onlyinclude> is special-cased in a funny way). So this is really a feature request, not a bug.

It totally doesn't. It kind of works for this specific case, but not in general.

Are you sure? Because in https://en.wikipedia.org/wiki/Wikipedia:Transclusion#Markup it says:

There can be several such sections. Also, they can be nested. All possible differences between here and there are achievable.