Page MenuHomePhabricator

Links with `?oldformat=true` or escaped slashes bypass NOINDEX
Open, Needs TriagePublic

Description

Articles for Deletion discussions on the English Wikipedia, in theory, are NOINDEX-tagged so they don't show up in search engines. However, the same link, encoded with ?oldformat=true at the end, still does. This negates the NOINDEX tag somewhat.

For example, a search for foo will _not_ come up with https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Foo but will for https://en.wikipedia.org/wiki/Wikipedia%3AArticles_for_deletion%2FFoo?oldformat=true.

See, for example, the second page of Google search results for "Todd Claydon".

This may well be something unfixable (it appears that Google is picking up on the URL itself).

Event Timeline

Just asked for a new disallow rule at https://en.wikipedia.org/wiki/MediaWiki_talk:Robots.txt#Google_thinks_it.27s_cute.2C_we_need_to_blacklist_Wikipedia.253AArticles_for_deletion.252F - seemed relevant to this discussion.

Should we perhaps consider not parsing article paths as escaped query string parameters, if the Location header has no query string?

jrbs renamed this task from Links with `?oldformat=true` bypass NOINDEX somehow to Links with `?oldformat=true` or escaped slashes bypass NOINDEX.Dec 8 2016, 6:31 PM