HomePhabricator

MediaWiki.php: Redirect non-standard title urls to canonical

Description

MediaWiki.php: Redirect non-standard title urls to canonical

Urls that use the page's title and no extra query parameters now redirect
to the standard url format.

Previously we only did this for variations of the title value (e.g. "Foo%20Bar"),
not for variations of the overall url structure (like title=Foo -> /wiki/Foo).

Existing redirect (unchanged):
/wiki/Foo%20Bar
/w/index.php?title=Foo%20Bar

New redirects:
/wiki/Foo_Bar?action=view
/w/index.php?title=Foo_Bar
/w/index.php?title=Foo_Bar&action=view

Any intentional (or unintentional) ways a url can be rewritten by the server,
such as "/?title=Foo_Bar" in case of Wikimedia, are redirected as well.

While this has been a problem for many years, it went unnoticed until
recently when Google started to index significantly more results of
the "/?title=<name>" form. This query returns "About 3,220,000 results":
https://google.com/search?q=site:en.wikipedia.org+inurl:title+-intitle:title

The only change in logic is that the titlekey comparison is now no longer a
factor in deciding whether to redirect. Instead the existing comparison for the
entire url is used to cover this.

However I kept titlekey comparison in the redirect-loop check as otherwise this
check would throw on all canonical page views where no redirect can be made.
Added a comment explaining how this redirect loop was possible.

Bug: T67402
Change-Id: I88ed3525141c765910e66188427b9aab36b958a9

Event Timeline

I monitored the Google results with the link in the description. We are now two months after the commit.

At first sight it is strange because I obtain 4,560,000 results, and 81,600,000 with the more general query https://www.google.fr/search?q=site:wikipedia.org+inurl:title+-intitle:title. But when I walk accross results, the last page (for en.wp.org) is the 29th page, announcing 281 results, so the commit seems to be effective (although it should have been tested in the past what was the last page); and the last page for wp.org is the 35th with 344 results. However I don’t know why Google announces such high numbers in the first page.