Page MenuHomePhabricator

Deprecate Title::getUserCaseDBKey()
Open, LowPublic

Description

Title::getUserCaseDBKey() is used to get the db key, with the initial letter casing as specified by the user. In core it's only used in FileRepo.php and MediaWikiTitleCodec.php, and has a minimal number of uses in extensions (CentralAuth, PushToWatch etc).

The main problem with this is that depending on how you construct the Title object, this function may or may not work. It's also one of the differences between Title and TitleParser, and I don't see any good use cases of keeping it around.

Acceptance Criteria

  • Soft Deprecation
  • Removal from production
  • Hard Deprecation
  • Removal of code from code base entirely

Event Timeline

Legoktm created this task.Aug 16 2018, 8:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2018, 8:21 PM

There's a commonswiki user group, Image-reviewer with uppercase I. Is this affected by this Task. Shall we attempt to rename that group as well?

D3r1ck01 updated the task description. (Show Details)Oct 5 2018, 6:30 PM

There's a commonswiki user group, Image-reviewer with uppercase I. Is this affected by this Task. Shall we attempt to rename that group as well?

No, that doesn't affect this task. It's just CentralAuth global groups that are affected.

D3r1ck01 updated the task description. (Show Details)Nov 4 2018, 1:13 PM

Change 471545 had a related patch set uploaded (by D3r1ck01; owner: Alangi Derick):
[mediawiki/core@master] Soft deprecate Title::getUserCaseDBKey()

https://gerrit.wikimedia.org/r/471545

D3r1ck01 claimed this task.Nov 4 2018, 1:50 PM
D3r1ck01 added a project: User-D3r1ck01.
D3r1ck01 moved this task from Backlog to Doing [WIP] on the User-D3r1ck01 board.
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Soft deprecation done. Next is removal from use in Wikimedia production, then hard deprecation, then removal (in MW 1.35 if we get the rest done by MW 1.33).

Change 471545 merged by jenkins-bot:
[mediawiki/core@master] Soft deprecate Title::getUserCaseDBKey()

https://gerrit.wikimedia.org/r/471545

D3r1ck01 awarded a token.

Soft deprecation done. Next is removal from use in Wikimedia production, then hard deprecation, then removal (in MW 1.35 if we get the rest done by MW 1.33).

Thanks for reviewing @Jdforrester-WMF, for the production part, I'm not sure I can handle that aspect, so I'll place this up for grab and maybe later reclaim it during hard deprecation and removal.

D3r1ck01 removed D3r1ck01 as the assignee of this task.Dec 18 2018, 8:41 AM
D3r1ck01 triaged this task as Low priority.
D3r1ck01 updated the task description. (Show Details)
D3r1ck01 added a subscriber: D3r1ck01.

@daniel asked me about this, following a complaint in CR from @Simetrical that the function was deprecated despite its usage not having significantly changed since it was introduced, without apparent awareness of what it is for. The idea is that if you have a remote file repository hosted on a wiki with $wgCapitalLinks=false, or a remote file repository that is not a MediaWiki instance, then [[File:foo.jpg]] and [[File:Foo.jpg]] may refer to two different files. So to know which one was intended, you have to get the link text before it was mangled by ucfirst().

I think the options are:

  • Simultaneously deprecate the initialCapital parameter to FileRepo. Make it impossible for MediaWiki to use a case-sensitive file repository.
  • Have a new class which tracks original link text, analogous to Title but used only by Parser and FileRepo, which can be used to pass the original link text from Parser::replaceInternalLinks2() to Parser::makeImage() and thence to Parser::fetchFileAndTitle(), Parser::fetchFileNoRegister() and RepoGroup::findFileFromKey().
  • Add a parameter or option to those functions to do the same.

The original link text is only needed to construct the File object, so it shouldn't need to be passed to to Linker, since Linker receives a File object with the correct name.

CentralAuth doesn't really need this function since it already has the original link text as a local variable.

PushToWatch is apparently using it under the mistaken impression that it relates to the User namespace.

@daniel asked me about this, following a complaint in CR from @Simetrical that the function was deprecated despite its usage not having significantly changed since it was introduced, without apparent awareness of what it is for. The idea is that if you have a remote file repository hosted on a wiki with $wgCapitalLinks=false, or a remote file repository that is not a MediaWiki instance, then [[File:foo.jpg]] and [[File:Foo.jpg]] may refer to two different files. So to know which one was intended, you have to get the link text before it was mangled by ucfirst().

Isn't there a problem with that if the Title object was cached in LinkCache or pre-loaded using LinkBatch? I have not checked the logic in detail, but it seems to me like these classes are not aware of the idea of a user-case DB key. Or even Title::instanceCache, which uses the normalized DB key as a key.

  • Simultaneously deprecate the initialCapital parameter to FileRepo. Make it impossible for MediaWiki to use a case-sensitive file repository.

If backwards compatibility was not a problem, I'd prefer file names to be entirely case insensitive (yet preserving). But I suppose it's to late for that. Could of course be optional, but if it's not the default and we don't use it, it's probably not worth the effort.

  • Have a new class which tracks original link text, analogous to Title but used only by Parser and FileRepo, which can be used to pass the original link text from Parser::replaceInternalLinks2() to Parser::makeImage() and thence to Parser::fetchFileAndTitle(), Parser::fetchFileNoRegister() and RepoGroup::findFileFromKey().

Linke a FileLinkTarget, which is a subclass of LinkTarget, but has the original case? I kind of like the idea, but I'm afraid it would be quite a bit of work to loop this through the code everywhere. Though maybe not, if Title implements it...

  • Add a parameter or option to those functions to do the same.

Couldn't we just require the capitalization rule for the File namespace to be the same as the one for any remote repo? Mixing multiple repos with different capitalization rules seems confusing anyway, if it ever worked correctly.

This is the option that makes most sense to me from a user perspective, and seems to be the simplest in code. Is there any reason that it wouldn't work?

The original link text is only needed to construct the File object, so it shouldn't need to be passed to to Linker, since Linker receives a File object with the correct name.
CentralAuth doesn't really need this function since it already has the original link text as a local variable.
PushToWatch is apparently using it under the mistaken impression that it relates to the User namespace.

Thank you for clarifying!

Isn't there a problem with that if the Title object was cached in LinkCache or pre-loaded using LinkBatch? I have not checked the logic in detail, but it seems to me like these classes are not aware of the idea of a user-case DB key. Or even Title::instanceCache, which uses the normalized DB key as a key.

LinkCache and LinkBatch are for local pages. The difference only exists for links to remote files.

Couldn't we just require the capitalization rule for the File namespace to be the same as the one for any remote repo? Mixing multiple repos with different capitalization rules seems confusing anyway, if it ever worked correctly.

There are a few local files on the Wiktionaries that would be broken by such a change. For links, it doesn't seem confusing to me at all. You can link to a case-sensitive file just as you can link to a case-sensitive URL or interwiki page. A special case in MediaWikiTitleCodec declines to capitalise interwiki links.

The shadow namespace feature is a bit more of a wrinkle. It works on English Wiktionary, e.g. https://en.wiktionary.org/wiki/File:godorf_Station_at_Dusk,_May_2018.jpg . Note that the link to Commons on that page was retrieved from the FileRepo and thus has the correct case. It probably doesn't work in the other direction, i.e. a case-insensitive wiki transcluding a case-sensitive file description page. The problem here is our conception of shadow namespaces as being like local pages, and the URL scheme that results from that. If those fake pages were at [[Special:RemoteFileDescription/foo.jpg]] there wouldn't be a problem.

daniel added a comment.EditedMay 27 2019, 7:22 AM

LinkCache and LinkBatch are for local pages. The difference only exists for links to remote files.

You mean, interwiki file links, like [[foo:File:xyz.jpg]]? I wasn't aware that was a thing, I thought remote repos always used the shadow namespace mechanism... or am I confused?

My thought was that if the same page links to both [[File:Foo.jpg]] and [[File:foo.jpg]], both links would end up being represented by the same Title object due to caching, and the information about the original case would be lost.

There are a few local files on the Wiktionaries that would be broken by such a change. For links, it doesn't seem confusing to me at all. You can link to a case-sensitive file just as you can link to a case-sensitive URL or interwiki page. A special case in MediaWikiTitleCodec declines to capitalise interwiki links.

Oh, I wasn't aware there were files starting with lower-case letters on Wiktionary. I assume it would be pretty easy to fix.

The shadow namespace feature is a bit more of a wrinkle. It works on English Wiktionary, e.g. https://en.wiktionary.org/wiki/File:godorf_Station_at_Dusk,_May_2018.jpg . Note that the link to Commons on that page was retrieved from the FileRepo and thus has the correct case. It probably doesn't work in the other direction, i.e. a case-insensitive wiki transcluding a case-sensitive file description page. The problem here is our conception of shadow namespaces as being like local pages, and the URL scheme that results from that. If those fake pages were at [[Special:RemoteFileDescription/foo.jpg]] there wouldn't be a problem.

That's why I was thinking that case-sensitive file repos shouldn't be a thing at all...

LinkCache and LinkBatch are for local pages. The difference only exists for links to remote files.

You mean, interwiki file links, like [[foo:File:xyz.jpg]]? I wasn't aware that was a thing, I thought remote repos always used the shadow namespace mechanism... or am I confused?

No, I mean links to remote files, like [[File:foo.jpg]] where foo.jpg does not exist on the local wiki. Shadow namespaces are only a view layer illusion. There's no model layer concept of shadow namespaces. We have a page view override and a file link override, but if you query the status of the illusory page any other way, you find that it doesn't exist. We have a model layer concept of a FileRepo and RepoGroup, which is more relevant.

My thought was that if the same page links to both [[File:Foo.jpg]] and [[File:foo.jpg]], both links would end up being represented by the same Title object due to caching, and the information about the original case would be lost.

This problem does indeed exist in RepoGroup::findFile(), which is using Title::getDBkey() to get a cache key instead of Title::getUserCaseDBKey(). Also, RepoGroup::findFiles() calls File::normalizeTitle(), a static function which is not aware of FileRepo case conventions.

There are a few local files on the Wiktionaries that would be broken by such a change. For links, it doesn't seem confusing to me at all. You can link to a case-sensitive file just as you can link to a case-sensitive URL or interwiki page. A special case in MediaWikiTitleCodec declines to capitalise interwiki links.

Oh, I wasn't aware there were files starting with lower-case letters on Wiktionary. I assume it would be pretty easy to fix.

There's 29 on English Wiktionary, 6 on French Wiktionary, probably a few others.

The shadow namespace feature is a bit more of a wrinkle. It works on English Wiktionary, e.g. https://en.wiktionary.org/wiki/File:godorf_Station_at_Dusk,_May_2018.jpg . Note that the link to Commons on that page was retrieved from the FileRepo and thus has the correct case. It probably doesn't work in the other direction, i.e. a case-insensitive wiki transcluding a case-sensitive file description page. The problem here is our conception of shadow namespaces as being like local pages, and the URL scheme that results from that. If those fake pages were at [[Special:RemoteFileDescription/foo.jpg]] there wouldn't be a problem.

That's why I was thinking that case-sensitive file repos shouldn't be a thing at all...

Note that the "other direction" (from a case-insensitive wiki to a case-sensitive FileRepo) is unused by WMF and probably unused everywhere. That feature could be removed without anyone noticing. But removing the ability to upload case-sensitive files to a case-sensitive wiki is a b/c break and is not justified by considerations like cache collision. It's also not justified by this task, since getUserCaseDBKey() returns the same thing as getDBkey(). You'd have to capitalise all local files regardless of $wgCapitalLinks, otherwise you would repeat this migration problem every time a third party wiki with $wgCapitalLinks=false wants to add a remote repo like instant commons.

In other words, removing the ability for a $wgCapitalLinks=true wiki to show files from a file repo with initialCapital=false would allow this task to be closed, and would avoid a number of bugs which would be evident if anyone tried to use that feature. Whereas removing the ability for a $wgCapitalLinks=false to use files from a file repo with initialCapital=true, or requiring capitalisation of the File namespace regardless of $wgCapitalLinks, would be a b/c break and is not necessary to close this task or fix any known bugs.

In other words, removing the ability for a $wgCapitalLinks=true wiki to show files from a file repo with initialCapital=false would allow this task to be closed, and would avoid a number of bugs which would be evident if anyone tried to use that feature. Whereas removing the ability for a $wgCapitalLinks=false to use files from a file repo with initialCapital=true, or requiring capitalisation of the File namespace regardless of $wgCapitalLinks, would be a b/c break and is not necessary to close this task or fix any known bugs.

So, the way forward is:

  • make it illegal to have initialCapital=false but $wgCapitalLinks=true (or the File namespace using capitalization). We don't know of any users of this setup, and looking at the code, it doesn't work right. It's also the only thing that needs getUserCaseDBKey(). This should be filed as a subtask blocking the hard deprecation (and removal) of getUserCaseDBKey().
  • still allow initialCapital=true with $wgCapitalLinks=false (or the File namespace not using capitalization). Wiktionary uses this. This doesn't need getUserCaseDBKey(), since with $wgCapitalLinks=false, getUserCaseDBKey() returns the same value as getDBKey().

Did I get this right?