Page MenuHomePhabricator

MW 1.35 'noindex,nofollow' robot tag on every page
Open, Needs TriagePublic

Description

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).

Every page on the wiki has a robots tag containing
<meta name="robots" content="noindex,nofollow">

Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).

I've narrowed it down to line 1019 in Article.php.

...
if ( $this->mPage->getId() === 0 || $this->getOldID() ) {
...

If the getOldID() part is removed the correct behaviour is restored.

Alternatively I've expanded the if clause as following, which seems to produce the correct behaviour:

...
$isOldRevision = false;
# getOldID sometimes returns strings
$oldId = (int)$this->getOldID();

$record = $this->mPage->getRevisionRecord();

if ( $record !== null && $oldId !== 0) {
    $isOldRevision = (int)$record->getId() !== $oldId;
}

if ( $this->mPage->getId() === 0 || $isOldRevision ) {
...

Trace:

setIndexPolicy: noindex
#0 /var/www/html/includes/page/Article.php(889): OutputPage->setIndexPolicy(\'noindex\')
#1 /var/www/html/includes/actions/ViewAction.php(74): Article->view()
#2 /var/www/html/includes/MediaWiki.php(527): ViewAction->show()
#3 /var/www/html/includes/MediaWiki.php(313): MediaWiki->performAction(Object(Article), Object(Title))
#4 /var/www/html/includes/MediaWiki.php(940): MediaWiki->performRequest()
#5 /var/www/html/includes/MediaWiki.php(543): MediaWiki->main()
#6 /var/www/html/index.php(53): MediaWiki->run()
#7 /var/www/html/index.php(46): wfIndexMain()
#8 {main}

Event Timeline

Octfx created this task.Oct 13 2020, 1:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2020, 1:03 PM
Octfx updated the task description. (Show Details)Oct 13 2020, 1:11 PM
Octfx updated the task description. (Show Details)Oct 13 2020, 1:23 PM

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

Skin author here. I am unaware of any part of the skin code that is related to setting noindex meta in the head element so my guess is that it might be related to the core.

Octfx added a comment.Oct 14 2020, 8:13 AM

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

I've checked the Citizen source and there really should be no way it is responsible for setting the robot tags.

Both Timeless and Vector do not set the tag but they do generate the same trace as posted.
As a test I've set the default policy to 'index,follow' but the problem persists. The magic word is not set anywhere on the site.

As the trace I've posted is not skin specific I'm guessing that the skins overwrite it somewhere after the Article view action has run.

Furthermore git blame shows that line 1019 in Article.php has last been modified 5 years ago.
The return value of getOldID() sometimes returns strings or integers, so the if check relies on the truthiness of the value which could be an issue.

I've deployed the patch mentioned in my first post and so far everything is working normally.
Special pages and old revisions both get a 'noindex,nofollow' policy as commented here.

Both Timeless and Vector do not set the tag but they do generate the same trace as posted

How do they generate the trace?, give clear steps to reproduce

Octfx added a comment.EditedOct 15 2020, 10:50 AM

Both Timeless and Vector do not set the tag but they do generate the same trace as posted

How do they generate the trace?, give clear steps to reproduce

The trace was generated manually by instanciating an exception in OutputPage->setIndexPolicy() and printing the stack trace.

$e = new Exception;
wfVarDump("<br><h1>setIndexPolicy: $policy</h1>".$e->getTraceAsString());
Octfx updated the task description. (Show Details)Oct 21 2020, 8:58 AM

Update: The 'patch' had an error on file pages.

...
$isOldRevision = false;
# getOldID sometimes returns strings
$oldId = (int)$this->getOldID();

$record = $this->mPage->getRevisionRecord();

if ( $record !== null && $oldId !== 0) {
    $isOldRevision = (int)$record->getId() !== $oldId;
}

if ( $this->mPage->getId() === 0 || $isOldRevision ) {
...

fixes the problem

Octfx claimed this task.Oct 27 2020, 9:42 AM

Change 636623 had a related patch set uploaded (by Aklapper; owner: Octfx):
[mediawiki/core@master] Fix noindex,nofollow robot tag on all pages

https://gerrit.wikimedia.org/r/636623