Page MenuHomePhabricator

Interwiki redirects via URL and other 301s are indexed by Google search
Open, MediumPublic

Description

Searching for Wikimedia Commons on Google.nl ( http://www.google.com/search?client=safari&rls=en&q=wikimedia+commons / language: dutch ) the top result is this:

  • Wikimedia Commons

The IUCN Red List has agreed to license distribution maps derived from the data present on their website with a Creative Commons By-SA 2.0. ...
nl.wikipedia.org/wiki/Commons: - Cached

That's an interwiki link, probably indexed by either a double interwiki (ie. en.wikipedia linking to nl:commons: )

Perhaps make these kind of urls, for interwikis that are not limited to local (such as bugzilla: and mw:), a 301 redirect instead ?


Version: 1.21.x
Severity: normal
See Also: T48424, T33838, T38174, T67402.

Details

Reference
bz26115

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:20 PM
bzimport set Reference to bz26115.
bzimport added a subscriber: Unknown Object (MLST).

Re-opening.

https://www.google.com/search?q=proofread+extension

  1. Extension:Proofread Page - MediaWiki

    url: en.wikipedia.org/wiki/Mw:Extension:Proofread_Page

    The Proofread Page extension can render a book either as a column of OCR text beside a column of scanned images, or broken into its logical organization ...
  • Bug 28242 has been marked as a duplicate of this bug. ***

Clarifying subject.

Bug is previously known as:
" nl.wikipedia.org/wiki/commons:Main_Page indexed by Google "

As of r84820 MediaWiki has been sending 301s, so changing bug title accordingly.

$ wget -S http://en.wikipedia.org/wiki/Mw:Extension:Proofread_Page

(...)
HTTP/1.0 301 Moved Permanently

Tim, any idea why this bug might be (re-)appearing?

Apparently not just for interwiki redirects

q=wikispecies

  1. Wikispecies, free species directory

    url: species.wikipedia.org/Wikimedia

    Foundation project that aims to catalog all species.

Is Google just being stupid and treating all 301s are 302s now?

This is not only about interwikis: I just found
MediaWiki-commits Info Page
mail.wikipedia.org/mailman/.../mediawiki-cvs

$ curl -i http://mail.wikipedia.org/mailman/listinfo/mediawiki-cvs
HTTP/1.1 301 Moved Permanently
Location: https://lists.wikimedia.org/mailman/listinfo/mediawiki-cvs

(In reply to comment #6)

Is Google just being stupid and treating all 301s are 302s now?

Yes?

I've sent an inquiry to our contacts at Google about this.

They're looking into it as of a few days ago; haven't heard anything more specific yet.

Is canonical url a related issue?
For instance:
<link rel="canonical" href="//commons.wikimedia.org/wiki/File:Harshad_GNUnify_2013.jpg" />
But first result on Google for "Harshad GNUnify" is the file description on en.wikipedia.org.

All examples in the bug now return the correct URLs for me. Resolving as fixed for now.

(In reply to comment #11)

All examples in the bug now return the correct URLs for me. Resolving as
fixed
for now.

The search
https://www.google.com/search?q=proofread+extension
still returns
http://en.wikipedia.org/wiki/Oldwikisource:Wikisource:ProofreadPage
which redirects to
http://wikisource.org/wiki/Wikisource:ProofreadPage

There are lots of similar examples here:
https://www.google.com/search?q=intitle%3AWikibooks+inurl%3Awikipedia.org%2Fwiki%2Fb%3A

Entering

proofread extension inurl:oldwikisource

in Google, the only results I get are

  • en.wikipedia.org/wiki/Oldwikisource:Wikisource:Scriptorium
  • en.wikipedia.org/wiki/Oldwikisource:Wikisource:2007

Helder: Do you still see the problem in comment 12?
As I wrote in comment 13, I cannot reproduce.

I can confirm comment 15 on google.com.

I checked this bug a few weeks ago and it was still current; now Erik says, "Actually none of the examples in the bug still return the reported results for me" (thanks).
http://thread.gmane.org/gmane.org.wikimedia.foundation/66428/focus=66464
Indeed they seem all resolved now, with a single probably irrelevant exception below. But why this oscillating? :/

(In reply to comment #10)

But first result on Google for "Harshad GNUnify" is the file description on
en.wikipedia.org.

That's fixed; second is now en.zero.wikipedia.org/.../File:Harshad_GNUnify , different bug.

(In reply to comment #12)

There are lots of similar examples here:
https://www.google.com/search?q=intitle%3AWikibooks+inurl%3Awikipedia.
org%2Fwiki%2Fb%3A

Now I only see fr.wikipedia.org/wiki/b:Utilisateur:Airridi‎ here

I don't see the Airridi‎ example on the last one either. All of the results are about copying to Wikibooks, etc.

Reopen if you have an example that still occurs.

It may or may not still be happening. But I don't see any examples at https://archive.today/ysbIZ .

(In reply to Matthew Flaschen from comment #23)

It may or may not still be happening. But I don't see any examples at
https://archive.today/ysbIZ .

Don't you see a list of meta.wikipedia.org URLs? Perhaps you didn't notice the "p"?

As of January 2015 this is still happening.

https://www.google.co.uk/search?q=tablesorter+site%3Amediawiki.org

Result #3: Help:Sorting - Meta - Meta-Wiki - Wikimedia
https://www.mediawiki.org/wiki/meta:Help:Sorting
The JavaScript code jquery.tablesorter.js (source) of the tablesorter is loaded by the ResourceLoader. Some sites may have a page MediaWiki:Common.js ...

A slight variation, I am seeing :en: links in Google results.

https://www.google.com.au/search?q=Elections+in+South+America+site:commons.wikimedia.org

results:

commons.wikimedia.org/wiki/:en:Argentina
commons.wikimedia.org/wiki/:en:José%20Luis%20Rodríguez%20Zapatero
commons.wikimedia.org/wiki/:en:Ohio
commons.wikimedia.org/wiki/:en:North%20Carolina
commons.wikimedia.org/wiki/:en:KwaZulu-Natal
commons.wikimedia.org/wiki/:en:Gibraltar
etc

Even for en.wikipedia.org, pointing to itself via en: and w:en:.

https://www.google.co.uk/search?q=special+export+delhi+wikipedia

w:en:Special:Export/Delhi - Wikipedia
https://en.wikipedia.org/wiki/en:Special:Export/Delhi
Wikipedia enwiki https://en.wikipedia.org/wiki/Main_Page MediaWiki first-letter Media Special Talk User User talk Wikipedia Wikipedia talk File File ...

See T91363 for the issue with Special:Export in general (even when not behind interwiki redirect).

Nemo_bis added a project: Traffic.
Nemo_bis set Security to None.