Page MenuHomePhabricator

Page titles could cause problems for some HTML editors that add a trailing slash to URLs
Open, LowPublicFeature

Description

Author: igor

Description:
As a developer and WorPress user I find this a potentional bug because some wysiwyg editor will change the dir to dir/ making the url erroneous for rendering as HTML request.

backslash "/" should redirect to directory!
so dir = dir/ server header should be 200 okay.

So request to "dir/" should return "dir" 200 okay!

http://en.wikipedia.org/wiki/User:Durova/ '''bug'''
does not redirect to
http://en.wikipedia.org/wiki/User:Durova

Now using a redirect hack

http://en.wikipedia.org/wiki/User:Igorberger/
redirects to
http://en.wikipedia.org/wiki/User:Igorberger

This can easily be fixed as a url rewrite in .htaccess or apache config file.

Thank you,
Igor Berger


Version: 1.12.x
Severity: enhancement

Details

Reference
bz12703

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:06 PM
bzimport set Reference to bz12703.
bzimport added a subscriber: Unknown Object (MLST).

MediaWiki wrote:

I know exactly what Igor's talking about here; MediaWiki's page title handling (couldn't find a component for that, BTW) has always treated titles like [[User:Voyagerfan5761]] and [[User:Voyagerfan5761/]] as different pages. Some WYSIWYG editors (for blogs, etc.) will add a trailing slash to URLs added to the text, which breaks any links to a MediaWiki site using pretty URLs like Wikipedia, Wikiquote, Meta, etc.

The proposal here is to have trailing slashes stripped from the page title, so things like this don't matter. I know some users, including myself, also will habitually add a slash at the end of a URL that looks like a directory (as most Wikipedia addresses do, .js and .css notwithstanding). If this is implemented, linking to http://en.wikipedia.org/wiki/User:Voyagerfan5761 and http://en.wikipedia.org/wiki/User:Voyagerfan5761/ would be functionally equivalent. The latter address would send a 301 redirect (ideally) to the former page, to tell search engines, browsers, etc. that the page's real address has no slash.

This should be fixable in Title.php, I think. I'm not that familiar with MediaWiki's core modules, but... It's the ''logical'' location.

igor wrote:

"backslash" should be slash or forward slash! My error in original bug reporting.

It can't be done as an apache configuration, as it's only relevant for namespaces that have sup-page option enabled.

igor wrote:

This is the solution!
I have not tested it yet, but should work.

If requested URL does not resolve to an existing directory

rewriteCond %{REQUEST_FILENAME} !-d

Externally redirect to remove trailing slash

rewriteRule ^(.+)/$ http://www.example.co.uk/$1 [R=301,L]

MediaWiki wrote:

Igor, there are no directories inside /wiki/. Actually, there is no /wiki/ directory. All of that is an Apache rewrite changing /wiki/Title to /w/index.php?title=Title. It will never resolve to an existing directory, even if the URL does not end in a slash.

This is something MediaWiki should handle, I think. If no page with a slash exists, but the same title minus the ending slash is in the database, redirect to the title without a slash.

igor wrote:

Yes it can be done with a database query but much more load to check database everytime a page is called.

So just implement the rewrite rule for all, should fix the bug.

Externally redirect to remove trailing slash

rewriteRule ^(.+)/$ http://www.example.co.uk/$1 [R=301,L]

A rewrite rule would be inadequate, as it would require custom manual set up for every MediaWiki site the world over, and would not be feasible on all of them. Much easier on everyone for the software to process things correctly, if that's what we want it to do.

Issues to consider:

  1. Existing pages with '/' suffix (can be killed if necessary, we make half-broken titles illegal all the time)
  1. Special page parameters where you *want* the / as it's partial input, eg:

http://en.wikipedia.org/wiki/Special:Prefixindex/User:Brion_VIBBER/

MediaWiki wrote:

Regarding #2, I think we can just ignore slashes in special page titles. We should trust the user to know what they're putting in there. A prefixindex of User:Foo/ shouldn't be turned into a prefixindex of User:Foo, just because it would be the software modifying user input for no apparent reason.

Brion, would there be an issue with making MW redirect to the slash-less page only if one with a slash doesn't exist?

igor wrote:

I would do it across the board because it is a canonical domain issue and lose of page rank, as here
http://en.wikipedia.org/wiki/Main_Page/
http://en.wikipedia.org/wiki/Main_Page

Both point to the same page but it uses a 302 redirect not 301 redirect which is the wrong way to preserve Google page rank.

igor wrote:

Brion you can do both. Write a script for world and put rewrite rule for WikiPedia so save databese resources.

The use of /wiki/ is only a common tradition. There is nothing statically defined in MediaWiki that says that links will be in the traditional format.

$wgArticlePath which is used to create this tradition is a string with a $1 substitution and thus commonly set to "/wiki/$1" but that doesn't mean that it's the tactic that is going to be followed. For all we know, the user could set it to "/wiki/$1/otherstuff" and then setup their rewrites according to that and it would be perfectly valid.

I've also seen wiki which use a trailing slash in some titles as part of the title. No links of the top of my head, but it's possible that a wiki may already be using a page Foo, and putting a list of subpages at Foo/.

Then there is the real default title to consider, which in truth is /index.php?title=Pagetitle&action=view in which a / is perfectly valid and doesn't run into the conflict. Thus another note is this kind of thing primarily only happens when the person setting up the wiki already did work on setting up rewrites, because wiki without this kind of configuration will actually be using the long format purely.

So I'd argue that while the option to do this would be a good addition (and perhaps default when short urls are enabled), it should not be forced on older wiki just because they upgraded.

There's also another note I should make. Trailing /'s are also used by a number of spambots. On Wikia we've in general blacklisted Talkpages and Forum pages with a trailing / because it prevents a number of spambots. If the redirection were automatically added, then suddenly the piles of spambot attacks aimed at [[Forum:Index/]] will suddenly go and hit [[Forum:Index]]. So there is a little harm to also consider in enabling it. So it should definitely be optional, and titles with a trailing / should not be deemed illegal.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM