Page MenuHomePhabricator

Articles for deletion on it.wp get indexed on Google despite robots.txt?
Closed, InvalidPublic

Description

It might be useless, but I tried this edit https://it.wikipedia.org/w/index.php?diff=68704385 because googling name + username of that person + "wikipedia" led to the page of the deletion procedure of her article on it.wp (among the top results for the search). This might happen sometimes with mirrors, but is not supposed to happen with the actual site. What am I missing?


Version: wmf-deployment
Severity: normal

Details

Reference
bz72195

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:46 AM
bzimport set Reference to bz72195.
bzimport added a subscriber: Unknown Object (MLST).
Elitre renamed this task from Articles for deletions on it.wp get indexed on Google despite robots.txt? to Articles for deletion on it.wp get indexed on Google despite robots.txt?.Dec 11 2014, 4:00 PM
Elitre set Security to None.

Huh; that's weird :/. Would suggest reaching out to Chad or one of the other devops people (Ariel?)

I see the entry for it but don't know why google wouldn't listen to robots.txt here.

I'm not sure what search you're trying but I did notice that urls like this: http://it.wikipedia.org/?title=Wikipedia:Pagine_da_cancellare&useFormat=mobile show up in Google search results, so that may be an issue.

MediaWiki must add noindex for 404 pages and use robots.txt to allow google to crawl them. btw that's a google issue.

Ironholds subscribed.

Except deleted pages are not 404s.

Elitre claimed this task.

As of today, that doesn't seem to happen anymore - I had checked other times in these months and it was still happening.