Page MenuHomePhabricator

Some section links in search results are redlinks
Open, MediumPublic

Description

I’ve noticed that in some search results on mediawiki.org, the link to the section containing the result is a redlink.

standalone group:

Screenshot 2021-07-01 at 15-51-37 Search results for standalone group - MediaWiki.png (456×718 px, 65 KB)

phpunit testing list groups:

Screenshot 2021-07-01 at 15-52-12 Search results for phpunit testing list groups - MediaWiki.png (815×544 px, 135 KB)

(Note that the first result has a blue section link.)

The links point to the target page without the section, e.g. https://www.mediawiki.org/w/index.php?title=Manual:PHP_unit_testing/Writing_unit_tests_for_extensions&action=edit&redlink=1 for the first example.

So far, I haven’t been able to reproduce this on another wiki.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Managed to reproduce it on Test Wikipedia:

confirmed first death (sorry for the morbid topic – I needed pages with some content and COVID stats were in Special:LongPages)

Screenshot 2021-07-01 at 15-57-00 Search results for confirmed first death - Test Wikipedia.png (628×729 px, 92 KB)

Still haven’t reproduced it on English Wikipedia, but was successful on Catalan Wikipedia (a group1 wiki):

hospitals identificat casos coronavirus

Screenshot 2021-07-01 at 16-01-43 Resultats de la cerca de «hospitals identificat casos coronavirus» - Viquipèdia, l'encicl[...].png (564×547 px, 95 KB)

brennen triaged this task as Unbreak Now! priority.Jul 1 2021, 3:05 PM
brennen added a project: User-brennen.
brennen moved this task from Backlog to Logs/Train on the User-brennen board.

In all the cases I've been able to find, the section in the red link actually exists, so I think this may be a problem with the link checking rather than out-of-date search indexes.

Bisected and looks like it's caused by 9e49260fc9586c962f8c4da64ff4c0f355c19a07

That was my suspicion as well. I'll look into it asap.

@dcausse is there a way for me to test this locally without having to set up the full Cirrus stack?

@daniel this rendering should be wholy outside cirrus, should be testable with bare mediawiki. The link rendering calls all happen from includes/search/searchwidgets in core

@daniel this rendering should be wholy outside cirrus, should be testable with bare mediawiki. The link rendering calls all happen from includes/search/searchwidgets in core

Hmm, actually i misspoke. While the rendering is in core, core doesn't provide sections in the results. Hmm, in that case you pretty much have to have an MWV instance arround, or hack something up to provide a section anyways.

Hmm, actually i misspoke. While the rendering is in core, core doesn't provide sections in the results. Hmm, in that case you pretty much have to have an MWV instance arround, or hack something up to provide a section anyways.

I will try to cook up a test case. It's not clear to me what's happening here.

If this helps I'm seeing:

2021-07-01 16:16:50 wiki LinkCache WARNING: non-proper page reference {"page-reference":{"Title":"Test"}}

in my logs.

It does seem to be some interaction between the search results page and the link checking. I can't find any instances on affected wikis of incorrect red section links on regular wiki pages.

If this helps I'm seeing:

2021-07-01 16:16:50 wiki LinkCache WARNING: non-proper page reference {"page-reference":{"Title":"Test"}}

in my logs.

@dcausse can you insert a var_dump( $page) in LinkCache:L153, right before this warning, and maybe a stack trace?

If this helps I'm seeing:

2021-07-01 16:16:50 wiki LinkCache WARNING: non-proper page reference {"page-reference":{"Title":"Test"}}

in my logs.

That's curious... Is this for a hit on a page called "Test" in the main namespace?

2021-07-01 16:32:19 wiki LinkCache WARNING: non-proper page reference {"page-reference":{"Title":"Test"}}
2021-07-01 16:32:19 wiki wfDebug DEBUG: <pre class='xdebug-var-dump' dir='ltr'>
<small>/vagrant/mediawiki/includes/cache/LinkCache.php:159:</small>
<b>object</b>(<i>Title</i>)[<i>876</i>]
  <i>public</i> 'mTextform' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>'Test'</font> <i>(length=4)</i>
  <i>public</i> 'mUrlform' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>'Test'</font> <i>(length=4)</i>
  <i>public</i> 'mDbkeyform' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>'Test'</font> <i>(length=4)</i>
  <i>public</i> 'mNamespace' <font color='#888a85'>=&gt;</font> <small>int</small> <font color='#4e9a06'>0</font>
  <i>public</i> 'mInterwiki' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>''</font> <i>(length=0)</i>
  <i>private</i> 'mLocalInterwiki' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>public</i> 'mFragment' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>'Test_test_hop'</font> <i>(length=13)</i>
  <i>public</i> 'mArticleID' <font color='#888a85'>=&gt;</font> <small>int</small> <font color='#4e9a06'>-1</font>
  <i>protected</i> 'mLatestID' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>private</i> 'mContentModel' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>private</i> 'mForcedContentModel' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>private</i> 'mEstimateRevisions' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>public</i> 'mRestrictions' <font color='#888a85'>=&gt;</font>
    <b>array</b> <i>(size=0)</i>
      <i><font color='#888a85'>empty</font></i>
  <i>protected</i> 'mOldRestrictions' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>public</i> 'mCascadeRestriction' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>public</i> 'mCascadingRestrictions' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>protected</i> 'mRestrictionsExpiry' <font color='#888a85'>=&gt;</font>
    <b>array</b> <i>(size=0)</i>
      <i><font color='#888a85'>empty</font></i>
  <i>protected</i> 'mHasCascadingRestrictions' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>public</i> 'mCascadeSources' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>public</i> 'mRestrictionsLoaded' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>public</i> 'prefixedText' <font color='#888a85'>=&gt;</font> <small>string</small> <font color='#cc0000'>'Test'</font> <i>(length=4)</i>
  <i>public</i> 'mTitleProtection' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>public</i> 'mDefaultNamespace' <font color='#888a85'>=&gt;</font> <small>int</small> <font color='#4e9a06'>0</font>
  <i>protected</i> 'mLength' <font color='#888a85'>=&gt;</font> <small>int</small> <font color='#4e9a06'>-1</font>
  <i>public</i> 'mRedirect' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>private</i> 'mHasSubpages' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>private</i> 'mPageLanguage' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>private</i> 'mDbPageLanguage' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>private</i> 'mTitleValue' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>private</i> 'mIsBigDeletion' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
  <i>private</i> 'mIsValid' <font color='#888a85'>=&gt;</font> <small>boolean</small> <font color='#75507b'>false</font>
  <i>private</i> 'mInstanceCacheKey' <font color='#888a85'>=&gt;</font> <font color='#3465a4'>null</font>
</pre>
2021-07-01 16:32:19 wiki wfDebug DEBUG: Exception: boom in /vagrant/mediawiki/includes/cache/LinkCache.php:165
Stack trace:
#0 /vagrant/mediawiki/includes/cache/LinkCache.php(396): LinkCache->getCacheKey(Object(Title))
#1 /vagrant/mediawiki/includes/Title.php(3189): LinkCache->addLinkObj(Object(Title))
#2 /vagrant/mediawiki/includes/Title.php(3863): Title->getArticleID(0)
#3 /vagrant/mediawiki/includes/Title.php(3939): Title->exists()
#4 /vagrant/mediawiki/includes/linker/LinkRenderer.php(166): Title->isKnown()
#5 /vagrant/mediawiki/includes/search/searchwidgets/FullSearchResultWidget.php(171): MediaWiki\Linker\LinkRenderer->makeLink(Object(Title), Object(HtmlArmor))
#6 /vagrant/mediawiki/includes/search/searchwidgets/FullSearchResultWidget.php(197): MediaWiki\Search\SearchWidgets\FullSearchResultWidget->generateAltTitleHtml('search-section', Object(Title), 'Test test <span...')
#7 /vagrant/mediawiki/includes/search/searchwidgets/FullSearchResultWidget.php(61): MediaWiki\Search\SearchWidgets\FullSearchResultWidget->generateSectionHtml(Object(CirrusSearch\Search\ArrayCirrusSearchResult))
#8 /vagrant/mediawiki/includes/search/searchwidgets/BasicSearchResultSetWidget.php(124): MediaWiki\Search\SearchWidgets\FullSearchResultWidget->render(Object(CirrusSearch\Search\ArrayCirrusSearchResult), 1)
#9 /vagrant/mediawiki/includes/search/searchwidgets/BasicSearchResultSetWidget.php(70): MediaWiki\Search\SearchWidgets\BasicSearchResultSetWidget->renderResultSet(Object(class@anonymous), 2)
#10 /vagrant/mediawiki/includes/specials/SpecialSearch.php(554): MediaWiki\Search\SearchWidgets\BasicSearchResultSetWidget->render('~hop', 0, NULL, Object(class@anonymous))
#11 /vagrant/mediawiki/includes/specials/SpecialSearch.php(228): SpecialSearch->showResults('~hop')
#12 /vagrant/mediawiki/includes/specialpage/SpecialPage.php(646): SpecialSearch->execute(NULL)
#13 /vagrant/mediawiki/includes/specialpage/SpecialPageFactory.php(1362): SpecialPage->run(NULL)
#14 /vagrant/mediawiki/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath('Search', Object(RequestContext))
#15 /vagrant/mediawiki/includes/MediaWiki.php(925): MediaWiki->performRequest()
#16 /vagrant/mediawiki/includes/MediaWiki.php(559): MediaWiki->main()
#17 /vagrant/mediawiki/index.php(53): MediaWiki->run()
#18 /vagrant/mediawiki/index.php(46): wfIndexMain()
#19 /var/www/w/index.php(5): require('/vagrant/mediaw...')
#20 {main}

As a workaround, what if we always render the link as a known one?

diff --git a/includes/search/searchwidgets/FullSearchResultWidget.php b/includes/search/searchwidgets/FullSearchResultWidget.php
index 7b79e1861a..9905781581 100644
--- a/includes/search/searchwidgets/FullSearchResultWidget.php
+++ b/includes/search/searchwidgets/FullSearchResultWidget.php
@@ -168,7 +168,7 @@ protected function generateMainLinkHtml( SearchResult $result, $position ) {
 	protected function generateAltTitleHtml( $msgKey, ?Title $title, $text ) {
 		$inner = $title === null
 			? $text
-			: $this->linkRenderer->makeLink( $title, $text ? new HtmlArmor( $text ) : null );
+			: $this->linkRenderer->makeKnownLink( $title, $text ? new HtmlArmor( $text ) : null );
 
 		return "<span class='searchalttitle'>" .
 				$this->specialPage->msg( $msgKey )->rawParams( $inner )->parse()

The function is apparently called for redirects, sections, and (with $title = null) categories; I assume the link target should exist in each of those cases.

If this helps I'm seeing:

2021-07-01 16:16:50 wiki LinkCache WARNING: non-proper page reference {"page-reference":{"Title":"Test"}}

in my logs.

That's curious... Is this for a hit on a page called "Test" in the main namespace?

No it's on my local wiki.

Ok, so LinkRenderer is looking whether the Title exists or not to decide whether the link is blue or red. After Daniel's patch, we short-circuit this check on Title::canExist - if the title can not exist, it's certainly doesn't exist.

Title::canExist checks if Title is valid - invalid title can not exist. But from the var_dump @dcausse has posted, Title::mIsValid is false! - thus the link is red. Now to figure out why is this title invalid.

But from the var_dump @dcausse has posted, Title::mIsValid is false! - thus the link is red. Now to figure out why is this title invalid.

Oh, good find, I didn't spot that! This is rather strange.

The title decides that it's invalid because mFragment is 'Test_test_hop' and MWTitleCodec returns fragment as 'Test test hop'. Title::isValid expects exact string match of re-parsed title parts. Boom!

That's curious... Is this for a hit on a page called "Test" in the main namespace?

No it's on my local wiki.

Yea, is it a hit for a page called "Test" in the main namespace or your wiki?

Also... could you hack it to get a stack trace?

The title decides that it's invalid because mFragment is 'Test_test_hop' and MWTitleCodec returns fragment as 'Test test hop'. Title::isValid expects exact string match of re-parsed title parts. Boom!

Ohh... and how does that mFragment get in there? Via Title::makeTitle?

That's the only place I can find where we don't normalize fragment. Unless cirrus does something like saving serialized titles and unserializing them..

Change 702699 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] Consistently normalize Title::mFragment before setting

https://gerrit.wikimedia.org/r/702699

The titles should generaly come from TitleHelper::makeTitle. The namespace and title values passed through are the same as those found in ?action=cirrusdump for any page. The fragment is not yet included there though, it is added from HighlightingTrait::findSectionTitle using Title::createFragmentTarget

Could someone with Cirrus setup test the patch above - I believe it will take care of the problem, but since it's urgent and I don't have cirrus setup locally, so it will probably be faster to ask for help here.

Could someone with Cirrus setup test the patch above - I believe it will take care of the problem, but since it's urgent and I don't have cirrus setup locally, so it will probably be faster to ask for help here.

Seems to work, section is no longer red.

Change 702702 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Improve logging in LinkCache.

https://gerrit.wikimedia.org/r/702702

Change 702699 merged by jenkins-bot:

[mediawiki/core@master] Consistently normalize Title::mFragment before setting

https://gerrit.wikimedia.org/r/702699

Change 702711 had a related patch set uploaded (by Brennen Bearnes; author: Ppchelko):

[mediawiki/core@wmf/1.37.0-wmf.12] Consistently normalize Title::mFragment before setting

https://gerrit.wikimedia.org/r/702711

Change 702711 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.12] Consistently normalize Title::mFragment before setting

https://gerrit.wikimedia.org/r/702711

Confirmed fixed on mwdebug2001; synching.

Mentioned in SAL (#wikimedia-operations) [2021-07-01T19:34:14Z] <brennen@deploy1002> Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2021-07-01T19:35:57Z] <brennen@deploy1002> Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)

brennen lowered the priority of this task from Unbreak Now! to Needs Triage.Jul 1 2021, 7:57 PM

Change 702702 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Improve logging in LinkCache.

https://gerrit.wikimedia.org/r/702702

This patch hasn’t been merged yet. Are we still interested in it? Otherwise I think we can close this task.

daniel claimed this task.

Change 702702 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Improve logging in LinkCache.

https://gerrit.wikimedia.org/r/702702

This patch hasn’t been merged yet. Are we still interested in it? Otherwise I think we can close this task.

I'm still interested in improving logging, but that shouldn't block this task. Closing.

Change 702702 merged by jenkins-bot:

[mediawiki/core@master] Improve logging in LinkCache.

https://gerrit.wikimedia.org/r/702702

@brennen , it looks like this ticket was reopened. I believe you(r team) worked on this last. Please let us know if this needs to be looked at by the Search team

@brennen , it looks like this ticket was reopened. I believe you(r team) worked on this last. Please let us know if this needs to be looked at by the Search team

RelEng involvement here was limited to train coordination and backport deployment. This is probably a question for @daniel et al.

WDoranWMF triaged this task as Medium priority.Mon, Sep 20, 6:46 PM
WDoranWMF added a subscriber: WDoranWMF.

@daniel can you take a look at this during the week of 27th September - when you have time.

I'm going to triage this as medium, if the priority is incorrect please let me know.

I just looked at it for 15 minutes, and I can't see why this is broken, but other section titles with spaces and non-ascii chars work fine. I'll see if I can find some time to dig deeper next week.

thanks @daniel, if we are not able to get further next week, it will have to go into the backlog. Given it's priority it will likely take sometime to get to this issue.