Page MenuHomePhabricator

Tracking category for __NOINDEX__
Closed, ResolvedPublic


Author: happy_melon

For transparency, pages using the NOINDEX and INDEX behavior switches should be auto-categorised into a tracking category a la [[Category:Hidden categories]] for HIDDENCAT. Ideally, this should only occur when the switch is actually having an *effect* - ie, only where the switch is allowed by $wgNamespaceRobotPolicies and $wgArticleRobotPolicies. This would achieve the double purpose of allowing users to see if the switch is having an effect, and allowing the use of the switches to be monitored.

Version: unspecified
Severity: minor



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:29 PM
bzimport set Reference to bz16979.
bzimport added a subscriber: Unknown Object (MLST).

jopiswezggzmw wrote:

Proposed patch

I've taken a stab at trying to do this. INDEX includes the page in [[Category:Indexed pages]] and NOINDEX includes the page in [[Category:Non-indexed pages]].

attachment Parser.patch ignored as obsolete

jopiswezggzmw wrote:

Define category names

attachment enMessage.patch ignored as obsolete

happy_melon wrote:

This categorises even if the action of INDEX__/NOINDEX__ is disabled by $wgExemptFromUserRobotsControl, doesn't it? From the fixmes in OutputPage.php, looks like the whole thing could do with an overhaul.

jopiswezggzmw wrote:

Proposed patch

New patch checks if the namespace is ExemptFromUserRobotsControl.

If it is then Parser.php does not add the category or setIndexPolicy, which (I think) makes the check on OutputPage.php redundant.

attachment new.patch ignored as obsolete

jopiswezggzmw wrote:

Factoring in ArticleRobotPolicies

I could kill two birds with one stone here.

It should now work like this;

  1. Check if the page has a policy defined in $wgArticleRobotPolicies - if it does not code will be executed so the page will not be added to the category and the new Index/Noindex policy will not be set.
  1. If not then check $wgExemptFromUserRobotsControl - if the namespace has a local policy then the policy will not be set.
  1. If not, check if NOINDEX/INDEX tags are in use
  1. If so then add it to the appropriate category and set the policy.

This is the first time I've really played around with MediaWiki's code so I don't know if it will work as intended but this should also solve the problem of NOINDEX/INDEX overriding a policy set in $wgArticleRobotPolicies.

attachment tryingagain.patch ignored as obsolete

jopiswezggzmw wrote:

Proposed patch v4

Tidied up the code a little.

attachment tc_noindex_v4.patch ignored as obsolete

jopiswezggzmw wrote:

New patch

Some improvements


Wouldn't it be a better idea to track this stuff in the page_props table, like we do with HIDDENCAT ?

jopiswezggzmw wrote:

HIDDENCAT also adds the page to [[Category:Hidden categories]]. wrote:

*YES*. This is the *perfect* solution. The situation is very similar, it's a 'property' that applies to individual pages that can be stored coherently in the page_props table, and the db query can be done in OutputPage.php rather than the parser. Is [[Category:Hidden categories]] populated 'normally', with links in the categorylinks table? Or is it generated entirely from page_props? There's probably no reason why a [[Category:Noindexed pages]] can't be dynamically-generated; it would additionally allow the categorisation to be filtered by NOINDEX tags that are functional (are suppressing indexing) and those that are not (ie are being overridden by other policies). This would make resolving bug14900 very much easier, as well. Great idea, Roan!

jopiswezggzmw wrote:

[[Category:Hidden categories]] is populated using the categorylinks table. My patch resolves bug14900 anyway (if the page is in $wgArticleRobotPolicies then NOINDEX/INDEX have no effect) but perhaps using page_props would be better. wrote:

done in r56688.