Page MenuHomePhabricator

MW 1.35 'noindex,nofollow' robot tag on every page
Open, Needs TriagePublic

Description

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).

Every page on the wiki has a robots tag containing
<meta name="robots" content="noindex,nofollow">

Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).

I've narrowed it down to line 1019 in Article.php.

...
if ( $this->mPage->getId() === 0 || $this->getOldID() ) {
...

If the getOldID() part is removed the correct behaviour is restored.

Alternatively I've expanded the if clause as following, which seems to produce the correct behaviour:

...
$isOldRevision = false;
# getOldID sometimes returns strings
$oldId = (int)$this->getOldID();

$record = $this->mPage->getRevisionRecord();

if ( $record !== null && $oldId !== 0) {
    $isOldRevision = (int)$record->getId() !== $oldId;
}

if ( $this->mPage->getId() === 0 || $isOldRevision ) {
...

Trace:

setIndexPolicy: noindex
#0 /var/www/html/includes/page/Article.php(889): OutputPage->setIndexPolicy(\'noindex\')
#1 /var/www/html/includes/actions/ViewAction.php(74): Article->view()
#2 /var/www/html/includes/MediaWiki.php(527): ViewAction->show()
#3 /var/www/html/includes/MediaWiki.php(313): MediaWiki->performAction(Object(Article), Object(Title))
#4 /var/www/html/includes/MediaWiki.php(940): MediaWiki->performRequest()
#5 /var/www/html/includes/MediaWiki.php(543): MediaWiki->main()
#6 /var/www/html/index.php(53): MediaWiki->run()
#7 /var/www/html/index.php(46): wfIndexMain()
#8 {main}

Event Timeline

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

Skin author here. I am unaware of any part of the skin code that is related to setting noindex meta in the head element so my guess is that it might be related to the core.

I'm seeing a very weird bug with MW 1.35 and the Citizen Skin (uses mustache).
Using the Timeless with ?useskin=timeless the robot tag disappears (correct behaviour).
...I've narrowed it down to line 1019 in Article.php.

If it's not happening in 'Timeless', then how are you sure the issue is not from the skin where you're seeing it?. It seems strange also to be skin specific.

You should try another skin, Vector or Monobook to see. And you should also check what's the value of $wgDefaultRobotPolicy in your LocalSettings.php as well as other related variables. It'd be good to also check for NOINDEX magic word usages there.

I've checked the Citizen source and there really should be no way it is responsible for setting the robot tags.

Both Timeless and Vector do not set the tag but they do generate the same trace as posted.
As a test I've set the default policy to 'index,follow' but the problem persists. The magic word is not set anywhere on the site.

As the trace I've posted is not skin specific I'm guessing that the skins overwrite it somewhere after the Article view action has run.

Furthermore git blame shows that line 1019 in Article.php has last been modified 5 years ago.
The return value of getOldID() sometimes returns strings or integers, so the if check relies on the truthiness of the value which could be an issue.

I've deployed the patch mentioned in my first post and so far everything is working normally.
Special pages and old revisions both get a 'noindex,nofollow' policy as commented here.

Both Timeless and Vector do not set the tag but they do generate the same trace as posted

How do they generate the trace?, give clear steps to reproduce

Both Timeless and Vector do not set the tag but they do generate the same trace as posted

How do they generate the trace?, give clear steps to reproduce

The trace was generated manually by instanciating an exception in OutputPage->setIndexPolicy() and printing the stack trace.

$e = new Exception;
wfVarDump("<br><h1>setIndexPolicy: $policy</h1>".$e->getTraceAsString());

Update: The 'patch' had an error on file pages.

...
$isOldRevision = false;
# getOldID sometimes returns strings
$oldId = (int)$this->getOldID();

$record = $this->mPage->getRevisionRecord();

if ( $record !== null && $oldId !== 0) {
    $isOldRevision = (int)$record->getId() !== $oldId;
}

if ( $this->mPage->getId() === 0 || $isOldRevision ) {
...

fixes the problem

Change 636623 had a related patch set uploaded (by Aklapper; owner: Octfx):
[mediawiki/core@master] Fix noindex,nofollow robot tag on all pages

https://gerrit.wikimedia.org/r/636623

Octfx removed Octfx as the assignee of this task.Mar 25 2021, 11:09 AM