Page MenuHomePhabricator

ContentHandler / #REDIRECT missing when using $content->getParserOutput( ...)
Closed, ResolvedPublic

Description

During our SMW\Test\SimplePageRedirectRegressionTest::testDataImport MW 1.23alpha (5b8215c) failed due to #REDIRECT being missing from the $parserOutput->
getText() when using $content->getParserOutput( ... ).

The test passes on MW versions that don't use the ContentHandler and instead use $this->parser->parse( ... ).

Code used in MW 1.23

$revision = Revision::newFromTitle( $this->getTitle(), false, Revision::READ_NORMAL );

$content = $revision->getContent( Revision::RAW );

$this->parserOutput = $content->getParserOutput(
$this->getTitle(),
$revision->getId(),
null,
true
);

    1. Test output using var_dump for $content->getParserOutput( ...)
  1. SMW\Test\SimplePageRedirectRegressionTest::testDataImport

This test printed output: string(18) "[[Has type::Page]]"
string(18) "[[Has type::Page]]"
string(33) "[[Category:Simple redirect test]]"
string(33) "[[Category:Simple redirect test]]"
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(33) "[[Category:Simple redirect test]]"
string(40) "Content of NewPageRedirectRegressionTest"
string(40) "Content of NewPageRedirectRegressionTest"
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "

Code used prior the ContentHandler

$revision = Revision::newFromTitle( $this->getTitle() );

$this->parserOutput = $this->parser->parse(
$revision->getText(),
$this->getTitle(),
$this->makeParserOptions(),
true,
true,
$revision->getID()
);

    1. Test output using var_dump using $this->parser->parse( ... )
  1. SMW\Test\SimplePageRedirectRegressionTest::testDataImport

This test printed output: string(18) "[[Has type::Page]]"
string(18) "[[Has type::Page]]"
string(33) "[[Category:Simple redirect test]]"
string(33) "[[Category:Simple redirect test]]"
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(80) "#REDIRECT [[SimplePageRedirectRegressionTest]] [[Category:Simple redirect test]]"
string(40) "Content of NewPageRedirectRegressionTest"
string(40) "Content of NewPageRedirectRegressionTest"
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "
string(141) "This is part of the [[PageRedirectRegressionTest]] [[Category:Regression test]] [[Category:Redirect test]] [[Category:Simple redirect test]] "

"#REDIRECT [[SimplePageRedirectRegressionTest]] is missing from the ContentHandler generated ParserOuptut text object.


Version: unspecified
Severity: blocker

Details

Reference
bz62856

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:04 AM
bzimport set Reference to bz62856.
bzimport added a subscriber: Unknown Object (MLST).
mwjames created this task.Mar 19 2014, 11:15 PM
Unknown Object (User) added a comment.Mar 19 2014, 11:40 PM

As noted on [0] (and confirmed by manual testing), the ContentHandler implementation works for versions other than MW 1.23.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/212

A last change there is gerrit 105829, but I have no idea, if that is related.

This does not seem to be an issue in ContentHandler, as the following produces expected output (using 1.23wmf17 as deployed on enwiki):

$title = Title::newFromText("Foo");
$revision = Revision::newFromTitle( $title, false, Revision::READ_NORMAL );
$content = $revision->getContent( Revision::RAW );
$parserOutput = $content->getParserOutput( $title, $revision->getId(), null, true );
echo $parserOutput->getText();

When var_dumping the $parserOutput object, I see nothing that remotely resembles the output that you are claiming your test returns (no field in the object contains wikitext), so I further doubt this has anything to do with MediaWiki core.

I don't see "SMW\Test\SimplePageRedirectRegressionTest" anywhere, so I can't comment on what it or the code it is testing might be doing. Although if Gerrit change 105829 is related, I can guess that you may be using some hook inside $wgParser->parse() to try to grab the wikitext being parsed for use later.

Absent further details, this bug should probably be reassigned to the appropriate extension.

Note that getParserOutput()->getText() will return the rendered HTML of the page.

There is no reason we should assume or insist that for a redirect, it will contain #REDIRECT. In fact, it would be perfectly sensible for it to return the HTML you are seeing when viewing a redirect with redirect=no: basically, a link to the redirect target, and an icon.

Also, for content models other than wikitext, redirects my be defined in some other way; #REDIRECT is used for wikitext only.

Marking as invalid since the expectation that rendered HTML contain #REDIRECT is unfounded.

(In reply to Daniel Kinzler from comment #4)

Marking as invalid since the expectation that rendered HTML contain
#REDIRECT is unfounded.

I don't think they're expecting the rendered HTML to contain "#REDIRECT". Looking at the quoted unit test results, they're apparently getting wikitext (not HTML) from somewhere. And since there's nothing in core that allows for getting the wikitext back from a ParserOutput, it must be something in their extension.

But absent further details, RESOLVED INVALID if it stays assigned as-is works for me. Or reopen and reassign to SMW or whatever extension is involved here.

This is a blocker for the release of 1.23. I'll dig into this a little and see what I can find.

Anomie added a comment.May 2 2014, 2:53 PM

The bug is in SMW, not core. But we don't seem to have an SMW component in Bugzilla anymore, so I guess "MediaWiki extensions / [other]" it is.

Unknown Object (User) added a comment.May 2 2014, 2:56 PM

The bug is in SMW, not core.

And we are quick in blaming others.

For those who are familiar with unit testing[0], I wrote a unit test that should even satisfy WMF employees to see that something doesn't match up when running the test on MW 1.23 and to avoid being blamed again for insufficient details.

I tried to write the unit test in a manner that should be expressive enough to compare results among standard text and text that includes "#REDIRECT".

If you run the test on MW 1.12/MW 1.22 all test passes while on MW 1.23/MW 1.24 it will fail.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/295/files

(In reply to Brad Jorsch from comment #7)

The bug is in SMW, not core. But we don't seem to have an SMW component in
Bugzilla anymore, so I guess "MediaWiki extensions / [other]" it is.

That's due to the will of SMW developers- see bug 62114. They prefer GitHub but as nobody can easily find out due to outdated bugtracker links I'll move this to the (otherwise closed) component.

Unknown Object (User) added a comment.May 2 2014, 3:08 PM

That's due to the will of SMW developers- see bug 62114

This bug has nothing to do with SMW.

The test is SMW independent and relies solely on MW provided functionality to verify that the ContentHandler has an issue with #REDIRECT and InternalParseBeforeLinks on MW 1.23/MW1.24 while passing on MW 1.21/MW 1.22.

In comparison, the test passes for both Parser & ContentHandler on all MW versions when using a "normal" text (e.g '[[Lorem ipsum]] dolor sit amet ...').

You can run the test [0] in a normal Jenkins or local phpunit environment.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/295/files

Anomie added a comment.May 2 2014, 3:12 PM

(In reply to MWJames from comment #8)

The bug is in SMW, not core.

And we are quick in blaming others.

Please explain why SMW is expecting the #REDIRECT line to be passed to the parser and breaking when it is not. As was asked back in March.

Chances are that whatever SMW code is breaking should be doing whatever it does in a different way. For example, perhaps it should be looking at the redirect table or checking the Content object's getRedirectTarget(). But without knowing what your code is actually trying to do, I can't do more than guess.

If you run the test on MW 1.12/MW 1.22 all test passes while on MW 1.23/MW
1.24 it will fail.

Just because you can write a unit test that tests for some behavior doesn't mean the behavior was actually correct.

Unknown Object (User) added a comment.May 2 2014, 3:33 PM

Just because you can write a unit test that tests for some behavior doesn't mean the behavior was actually correct.

Now here is the thing, the test does not make any assumption about SMW or any other extension, it solely tests behaviour (if you would have looked at the test you would recognize the difference in behaviour the test asserts) that is expected to work.

Because when a standard text is involved the expected text is delivered to the InternalParseBeforeLinks hook in either case (on any MW version) but when something like '#REDIRECT ...' is involved the InternalParseBeforeLinks behaves differently.

Chances are that whatever SMW code is breaking should be doing whatever it does in a different way. For example, perhaps it should be looking at the redirect

It is right, the difference in behaviour came to light during SMW testing on MW 1.23 but this does not implicate that it is SMW's fault just because you find this convenient.

Anomie added a comment.May 2 2014, 4:56 PM

MWJames, my point is that SMW was depending on buggy behavior, and just because you can write a unit test showing that the bug was fixed in 1.23 doesn't mean we should unfix the bug.

If you continue to just blindly insist that you're right and everyone else is wrong, we may as well close this bug as WONTFIX (or reassign it to MediaWiki core and mark it INVALID again).

Or we could move forward by you telling us what exactly SMW is doing by looking for the #REDIRECT in the wikitext passed to the parser and why it can't look in the redirect table or Content::getRedirectTarget() to fetch this information instead. As has been requested of you several times now.

Unknown Object (User) added a comment.May 2 2014, 5:48 PM

just because you can write a unit test showing that the bug was fixed in 1.23 doesn't mean we should unfix the bug.

Let me explain what the test does before you claim that the test is wrong.

When I have a page with '[[Lorem ipsum]] dolor sit amet ...' text and try to generate a ParserOutput object using either $this->parser->parse( ... ) or $content->getParserOutput( ... ) at some point the InternalParseBeforeLinks will be called and the hook interface ( &$parser, &$text ) where $text will contain '[[Lorem ipsum]] dolor sit amet ...' (for both cases ).

Now, when I have a page with '#REDIRECT [[...]]' text and try to generate a ParserOutput object using $this->parser->parse( ... ) where at some point the InternalParseBeforeLinks will be called, the hook interface (&$parser, &$text) with $text will contain '#REDIRECT [[...]]' but when $content->getParserOutput( ... ) is used the $text is empty (which should contain '#REDIRECT [[...]]').

The in behaviour is tested for a standard text and a text containing '#REDIRECT ...'.

Of course, if you are telling me that this is the expected behaviour then naturally you can argue that MW 1.21/MW 1.22 were wrong in the first place and that MW 1.23 has been fixed.

If you continue to just blindly insist that you're right and everyone else is wrong

Interesting, I'm not insisting on anything the only thing I'm asserting (actually the test is asserting) is a difference in behaviour for InternalParseBeforeLinks hook on the usage of $this->parser->parse(...) to $content->getParserOutput( ... ) for when #REDIRECT is involved nothing more and nothing less.

we may as well close this bug as WONTFIX (or reassign it to MediaWiki core and mark it INVALID again).

If this is the WMF approach nothing left for me to do.

by you telling us what exactly SMW is doing by looking for the #REDIRECT in the wikitext passed to the parser and why it can't look in the redirect table or Content::getRedirectTarget()

As for SMW, InternalParseBeforeLinks is used to create annotations.

If the text component delivered by the InternalParseBeforeLinks hook carries a "#REDIRECT target" (which works of course if we use $this->parser->parse(...) and worked for $content->getParserOutput( .. ) on MW 1.21/MW 1.22 ) then something like
ContentHandler::makeContent( $text, null, CONTENT_MODEL_WIKITEXT )->getRedirectTarget() or Title::newFromRedirect( $text ) will work as expected.

The problem is that text component retrieved through the InternalParseBeforeLinks hook no longer carries the "#REDIRECT ...".

Anomie added a comment.May 2 2014, 6:23 PM

(In reply to MWJames from comment #14)

Of course, if you are telling me that this is the expected behaviour then
naturally you can argue that MW 1.21/MW 1.22 were wrong in the first place
and that MW 1.23 has been fixed.

Yes, that's exactly what I'm telling you.

As for SMW, InternalParseBeforeLinks is used to create annotations.
If the text component delivered by the InternalParseBeforeLinks hook carries
a "#REDIRECT target" (which works of course if we use
$this->parser->parse(...) and worked for $content->getParserOutput( .. ) on
MW 1.21/MW 1.22 ) then something like
ContentHandler::makeContent( $text, null, CONTENT_MODEL_WIKITEXT
)->getRedirectTarget() or Title::newFromRedirect( $text ) will work as
expected.
The problem is that text component retrieved through the
InternalParseBeforeLinks hook no longer carries the "#REDIRECT ...".

I see, SMW has its own wikitext parser. That seems fragile.

I still believe that since the #REDIRECT line shouldn't be being parsed it shouldn't be included in the wikitext passed to the parser; if it were being passed to Parser::parse(), that function should still be stripping it off before it gets to the point of calling InternalParseBeforeLinks. It would make more sense to me that the redirect target be included as metadata somehow, which probably means storing it in the ParserOptions. Would that work for you?

(In reply to Brad Jorsch from comment #13)

MWJames, my point is that SMW was depending on buggy behavior, and just
because you can write a unit test showing that the bug was fixed in 1.23
doesn't mean we should unfix the bug.

From my point of view as a part of release management, it appears that SMW depends on some undocumented behavior of MW. It looks like Brad was unaware of SMW's dependency on this behavior, saw the bug and decided to fix it.

Keeping this "fix" will hurt around 8.5% of MediaWiki installations[1] while only abou 30 or 40 people have complained about the old behavior in the past. In the meantime, they've been able to cope with that behavior.

It would be good if the SMW devs could go back and look at the previous bugs claiming to be fixed here and see if the arguments presented there show that fix is merited despite their concerns.

In the meantime, I'm slating https://gerrit.wikimedia.org/r/#/c/105829/ for a revert. Could one of you verify that reverting that code fixes this problem?

[1] https://wikiapiary.com/wiki/Extension:Semantic_MediaWiki

Change 131109 had a related patch set uploaded by MarkAHershberger:
This reverts commit d8b1b79ea423ef3391c34f63aa382fd6bad6597e.

https://gerrit.wikimedia.org/r/131109

(In reply to Brad Jorsch from comment #15)

It would
make more sense to me that the redirect target be included as metadata
somehow, which probably means storing it in the ParserOptions. Would that
work for you?

If this is what you want to do, I suggest you make this change in 1.24 and work with the SMW devs to get it implemented.

Thank you Mark for recognizing this as an API break, loose from the discussion if the change itself is good or bad.

I'm saddened that the amount of finger pointing here so often outweighs the amount of listening being done. In the end, we all lose because of this.

Anomie added a comment.May 2 2014, 7:32 PM

(In reply to Mark A. Hershberger from comment #18)

If this is what you want to do, I suggest you make this change in 1.24 and
work with the SMW devs to get it implemented.

I'm hoping that we can resolve this in time that you could still include it in 1.23, if MWJames likes the proposal.

(In reply to Brad Jorsch from comment #20)

I'm hoping that we can resolve this in time that you could still include it
in 1.23, if MWJames likes the proposal.

I understand your desire to do that, but we have to be more conservative with LTS releases. This particular issue has only become apparent in the past week and we plan to make an LTS release in less than a month.

Unknown Object (User) added a comment.May 2 2014, 7:48 PM

In the meantime, I'm slating https://gerrit.wikimedia.org/r/#/c/105829/ for a revert. Could one of you verify that reverting that code fixes this problem?

Thanks, finally I was able to track the relevant change that introduced the changed behaviour.

Could one of you verify that reverting that code fixes this problem?

Yes, running the mentioned integration test passes again on MW 1.23 after reverting the change.

it appears that SMW depends on some undocumented behavior of MW.

Whether documented or undocumented, it still doesn't explain the difference in behaviour of Parser::parse() and ContentHandler::getParserOutput() which at least should behave consistently.

I still believe that since the #REDIRECT line shouldn't be being parsed it shouldn't be included in the wikitext passed to the parser; if it were being passed to Parser::parse()

If this is the intend behaviour for Parser::parse() and ContentHandler::getParserOutput().

It would make more sense to me that the redirect target be included as metadata somehow, which probably means storing it in the ParserOptions. Would that work for you?

If Parser::getOptions() by the time InternalParseBeforeLinks hook is called does contain information such as being a redirect page/has target page for the selected revision, I would see no problem to use that instead of doing ContentHandler::makeContent( $text, null, CONTENT_MODEL_WIKITEXT )->getRedirectTarget() or Title::newFromRedirect( $text ) manually during the InternalParseBeforeLinks hook.

Change 131109 abandoned by MarkAHershberger:
This reverts commit d8b1b79ea423ef3391c34f63aa382fd6bad6597e.

Reason:
oops! You guys are right. Moving to release branch.

https://gerrit.wikimedia.org/r/131109

Change 131209 had a related patch set uploaded by MarkAHershberger:
This reverts commit d8b1b79ea423ef3391c34f63aa382fd6bad6597e.

https://gerrit.wikimedia.org/r/131209

Change 131210 had a related patch set uploaded by Anomie:
Record redirect target in ParserOptions

https://gerrit.wikimedia.org/r/131210

Closing since patch has been merged.

Change 131210 merged by jenkins-bot:
Record redirect target in ParserOptions

https://gerrit.wikimedia.org/r/131210