Subpages are case-sensitive including first character
OpenPublic

Description

Author: firstpeterfourten+mwbugs

Description:
When I visit a nonexistent subpage, for example
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Washington,_D.C./archive_1
it reports that the page does not exist and that "Titles on Wikipedia are case sensitive except for the first character."
However, the page
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Washington,_D.C./Archive_1
does exist, and the only difference is the capital A in Archive.

The expected behavior is that the first character after a slash in a subpage name should not be case sensitive.

Before changing this, have a look at how many pages exist where there ARE two different pages differing only by the case of the first letter after a slash (or multiple case-changes-after-slashes, for nested subpages). Usually one will be a redirect to the other one. Please be sure to keep access to the one that is not a redirect.

If you think my intuitive expectation about the behavior is not at all grounded in reasonableness, we should at least update the "page not found" documentation to note that ALL characters, even the first, in subpage names are case sensitive. I'd rather just have the intuitive behavior, though.


Version: unspecified
Severity: enhancement

bzimport added a project: MediaWiki-Redirects.Via ConduitNov 21 2014, 11:27 PM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz27963.
bzimport created this task.Via LegacyMar 9 2011, 6:27 PM
MaxSem added a comment.Via ConduitMar 9 2011, 6:37 PM

This would lead to a number of unobvious breakages. For example, if we remove a namespace from $wgNamespacesWithSubpages - what will happen to: a) page titles, b) links to subpages?

Nikerabbit added a comment.Via ConduitMar 9 2011, 6:44 PM

It should be all (case insensitive) or nothing.

bzimport added a comment.Via ConduitMar 9 2011, 8:50 PM

firstpeterfourten+mwbugs wrote:

(In reply to comment #1)

This would lead to a number of unobvious breakages. For example, if we remove a
namespace from $wgNamespacesWithSubpages - what will happen to: a) page titles,
b) links to subpages?

I'm not sure that fixing this bug would actually make a difference in the answer to these questions. Removing the subpage feature of a namespace would have the same effect whether the first letter was case-sensitive or case-insensitive. What happens now? There might possibly be a few letters of case sensitivity added in the event that the subpage feature is removed, thereby breaking a few links, although a little redirect maker script (using What Links Here) could address that if needed.

Please post more of those possible "unobvious breakages" that you noted - as many of them as you can think of.

(In reply to comment #2)

It should be all (case insensitive) or nothing.

The documentation, including what's shown on the nonexistent page message, states that the first letter is case insensitive, and that's what I think most users expect. There are reasons for that decision, which you can probably find in the discussion where it was set (although I don't personally know where that is). If you want to change that policy, it's a bigger issue probably out of scope of this bug. I'm just looking for consistent behavior. The first character after the slash should be case-insensitive. I would not oppose making the whole string case-insensitive, but I think others would.

MZMcBride added a comment.Via ConduitMar 10 2011, 3:47 AM

(In reply to comment #0)

When I visit a nonexistent subpage, for example
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Washington,_D.C./archive_1
it reports that the page does not exist and that "Titles on Wikipedia are case
sensitive except for the first character."

There are two issues that are being conflated here. The first issue is that there aren't better suggestions for very close, but inexact page titles. This crops up in all sorts of places (capitalization changes, accent mark changes, etc.), but is outside the scope of this bug. The second issue is that the documentation is a bit misleading (at least inasmuch as it misled you). Titles _are_ case sensitive except for the first letter. The fact that the content is at "/A" but not at "/a" fairly clearly demonstrates this. The message could be rephrased, but that's an issue for the English Wikipedia, not an issue for MediaWiki development. The default "no page exists" text does not include this passage. If you would like to change the English Wikipedia's text, you can do so at http://en.wikipedia.org/wiki/MediaWiki:Noarticletext.

However, the page
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Washington,_D.C./Archive_1
does exist, and the only difference is the capital A in Archive.

Right. Page titles are case sensitive except for the first letter on the English Wikipedia, as established. This isn't the case for every MediaWiki installation (Wiktionaries, for example, are case insensitive for most namespaces).

The expected behavior is that the first character after a slash in a subpage
name should not be case sensitive.

I'm not sure this is the expected behavior. I can see arguments for it being this way, but current (widespread) practice on the English Wikipedia uses conventions such as "/doc" for template documentation subpages and "/sandbox" for many user sandboxen.

Before changing this, have a look at how many pages exist where there ARE two
different pages differing only by the case of the first letter after a slash
(or multiple case-changes-after-slashes, for nested subpages). Usually one
will be a redirect to the other one. Please be sure to keep access to the one
that is not a redirect.

Sure, some redirect. This is another case where page title suggestions (or even very cautious auto-redirection) need better implementation. But this is still outside the scope of this bug.

If you think my intuitive expectation about the behavior is not at all grounded
in reasonableness, we should at least update the "page not found" documentation
to note that ALL characters, even the first, in subpage names are case
sensitive. I'd rather just have the intuitive behavior, though.

Updating the documentation to say that _all_ characters are case sensitive isn't exactly right. The context for this message is largely people clicking red links. When a user does so, they want to be able to figure out why there isn't any content there. In this context, the full page title is not case sensitive, it's entirely case sensitive except for the first letter. For example, [[iPhone]] is always going to lead to [[IPhone]] just as [[IPhone]] would. The case of the first letter in links is irrelevant (except for display purposes), which is what the message at the English Wikipedia tries to convey.

Given all of this, I'm inclined to mark the bug as invalid, but I'll hold off for now.

bzimport added a comment.Via ConduitMar 10 2011, 5:02 AM

firstpeterfourten+mwbugs wrote:

(In reply to comment #4)

Titles _are_ case sensitive except for the first letter. The fact that the
content is at "/A" but not at "/a" fairly clearly demonstrates this.

When my base is the [[Talk:Washington, D.C.]] page and I go for a page labeled "archive 1," to me "archive 1" is the name of the page and all the stuff before that are qualifiers that help locate it: "Washingon, D. C." as much as "Talk" as much as "en" as much as "Wikipedia." To me, a "/doc" page is about doc (which it is), and all the qualifiers before that are just context - necessary, but not part of what I intuitively think of as the title for what I'm reading.

As an analogy, when browsing through a file directory structure, it's the string after the last slash (the local/currently viewing) directory name that serves as the title of the current directory. Whether my filesystem is case sensitive or not or all-but-first-letter, I expect it to be consistent no matter how deep I go in the directory structure, and I always think of the current folder title (without the parent hierarchy) as the title of what I'm looking at.

If you would like to change the English Wikipedia's text, you can do
so at http://en.wikipedia.org/wiki/MediaWiki:Noarticletext.

Thanks for the link. Changing the documentation is a second preference to changing the actual software behavior, but it's good to have that on hand.

Right. Page titles are case sensitive except for the first letter on the
English Wikipedia, as established.

This behavior should apply consistently to subpages just as well as non-subpages. Subpages are pages too.

The case of the first letter in links is irrelevant (except for display
purposes), which is what the message at the English Wikipedia tries to convey.

The first letter in subpage links IS relevant, though. That's what the bug is about.

MZMcBride added a comment.Via ConduitMar 10 2011, 5:17 AM

(In reply to comment #5)

(In reply to comment #4)

Right. Page titles are case sensitive except for the first letter on the
English Wikipedia, as established.

This behavior should apply consistently to subpages just as well as
non-subpages. Subpages are pages too.

I think some of this confusion is understandable, however an understanding of the MediaWiki internals makes some of this less confusing. (Granted, most users won't have this understanding either.)

Page namespaces are stored as integers and page titles are stored as strings in the database. The integers are converted to strings for localization when displayed to the user. The page title is always simply stored and displayed. A page title such as "Template:Foo/doc" has page_namespace = 10 and page_title = "Foo/doc". From the database's point of view, all page titles are equal. There is no distinction between a subpage or a non-subpage from the database's point of view.

MediaWiki defines which namespaces have subpages. This changes the behavior of links such as "/this" and it adds a breadcrumb trail to the top of the page, among other things. The subpages status configuration is what MaxSem was referring to in comment #1 above.

I think you make a valid point, but I'm not sure the costs to this change outweigh the benefits here. I also think that implementing this change might add more confusion going forward (e.g., "Talk:/dev/null/Archive 1" would be forced to be "Talk:/dev/Null/Archive 1"), unless some sort of indicator is added to the database about which pages are actually subpages (which has been discussed previously and may be a valid standalone request).

Add Comment