Page MenuHomePhabricator

robots.txt Last Modified header is wrong
Closed, ResolvedPublic

Description

Yesterday we discovered google was indexing banner ads as the blurb below search results, instead of actual wiki content. This was caused partly by a change in the banner URL, and the fact that we did not update robots.txt for the new URL.

Although the content of http://en.wikipedia.org/robots.txt is fixed, the Last Modified header is still 11/2013 -- date of the last wiki update. Details from Brad Jorsch:

That's the timestamp of the last edit to enwiki's MediaWiki:Robots.txt, see
https://en.wikipedia.org/w/index.php?title=MediaWiki:Robots.txt&action=history
[en.wikipedia.org] (and also w/robots.php in the operations/mediawiki-config repo).
 
Probably we should use the max of that or /srv/mediawiki/robots.txt to really get the
last-modified timestamp.

As a temporary workaround we're going to do a no-op wiki edit to bump that timestamp.

Event Timeline

Jgreen raised the priority of this task from to Unbreak Now!.
Jgreen updated the task description. (Show Details)
Jgreen added a project: MediaWiki-Core-Team.
Jgreen changed Security from none to None.
Jgreen subscribed.

Change 177940 had a related patch set uploaded (by BryanDavis):
robots.txt: Use max lastmod time

https://gerrit.wikimedia.org/r/177940

Patch-For-Review

Change 177940 merged by jenkins-bot:
robots.txt: Use max lastmod time

https://gerrit.wikimedia.org/r/177940

bd808 moved this task from Backlog to Done on the MediaWiki-Core-Team board.
bd808 removed a project: Patch-For-Review.

Change has been merged and deployed