Page MenuHomePhabricator

__INDEX__ and __NOINDEX__ should not override $wgArticleRobotPolicies
Closed, ResolvedPublic


Author: ayg

$wgArticleRobotPolicies allows site owners to set robot policies on a per-article basis. INDEX and NOINDEX allow random users to do the same. The owner's setting should win out here -- current INDEX__/NOINDEX__ do, because of the order the code is executed. The config setting is executed in Article::view() -- in the same place as the namespace setting, which the magic words *should* override, since they're more specific (page-specific). The magic words are handled in OutputPage::addParserOutputNoText(), which is lower-level and is run later.

I guess the best way to handle this would be to make OutputPage have a hierarchy of robot policies, so that Article could register a higher-priority robots setting that would override any later low-priority settings. But that seems awfully inelegant for such a minor detail.

Version: unspecified
Severity: minor



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:13 PM
bzimport set Reference to bz14900.

Assign to developer of the feature, accidentally also the reporter :).

happy_melon wrote:

Patch to resolve indexing conflicts, on r45695

This adds an optional parameter to OutputPage::setIndexPolicy, the 'precedence' of the method that is trying to change the configuration. The hierarchy is set up as

  • 5 = unset (initalised defaults as below)
  • 4 = set by $wgDefaultRobotPolicy
  • 3 = set by $wgNamespaceRobotPolicies
  • 2 = set by INDEX or NOINDEX magic words (where allowed by $wgExemptFromUserRobotsControl)
  • 1 = set by $wgArticleRobotPolicies
  • 0 = set 'on-the-fly' to hide things like special pages, old revisions, etc

Also rewrites OutputPage::setRobotsPolicy as a wrapper to use the new functions, and redefines all three as returning bool: whether the attempt to change the settings was successful, which should make it easier to resolve bug16979 cleanly (or at least *more* cleanly).

Patch needs review and is UNTESTED on a live MediaWiki installation.

attachment bug14900.txt ignored as private wrote:

Updated patch, against r53416

Updated patch. This moves all robots handling to a new Article::setRobotPolicyForView (reborn from Article::getRobotPolicyForView ), which uses array_merge to build a single policy from the various layers of config. This has the nice freebie that you can now say something like:

$wgDefaultRobotPolicy = 'index, nofollow';
$wgNamespaceRobotPolicies[NS_USER] = 'noindex';

And the 'nofollow' attribute will be inherited from the default policy, which would be expected behaviour. Currently the policy would be lost and the hardcoded default of 'follow' would be used... :(

The patch also cleans up Article::view() a little, to avoid the fourfold duplication of the do-this-when-the-body-has-been-constructed section; prompted because the call to setRobotPolicyForView() is moved there, so it has access to the parser output (which is now stored as a member variable $mParserOutput, rather than discarded) to check for NOINDEX tags.

I've also encapsulated the "do we allow NOINDEX tags in this namespace" logic in Title::canUseNoindex(), so it can be called from both Article::setRobotPolicyForView() and the Parser. This makes it trivial to resolve bug16979, which I've done in this patch; the modifications to Parser.php aren't strictly necessary to resolve *this* bug, but it's quite a nice (and useful) change.

My first efforts at the change involved using the page_props table, which might still be a good idea. This led me to improve the documentation for $wgPagePropLinkInvalidations, which I've left in because at the moment it's totally rubbish.

Attached: wrote:

Fixed in r55700