CentralAuth API list=globalallusers should capitalize the first letter
Open, LowPublicFeature
Actions

Assigned To

None

Authored By

	dbarratt
	Nov 17 2017, 11:28 PM

Description

These two requests do not return the same results:
https://meta.wikimedia.org/w/api.php?action=query&list=globalallusers&agufrom=d
and
https://meta.wikimedia.org/w/api.php?action=query&list=globalallusers&agufrom=D

The former is incorrect since usernames that begin with a lowercase letter are bugs. To resolve this, the API should internally uppercase the first letter (by using User::getCanonicalName()).

The only alternative is to have every client uppercase the first letter before sending over the request.

This should also be applied to any other endpoint where you can specify the username.

Workaround
T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive

Details

	Subject	Repo	Branch	Lines +/-
	Turn lowercase initial letter in usernames to uppercase before querying	mediawiki/extensions/CentralAuth	master	+6 -1

Customize query in gerrit

Related Objects

Mentioned In: T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive
Mentioned Here: T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive
T35602: list=allusers with auprop=group throws a MWException at user:%95

Event Timeline

dbarratt created this task.Nov 17 2017, 11:28 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2017, 11:28 PM

dbarratt added a project: Anti-Harassment.Nov 17 2017, 11:29 PM

dbarratt added a parent task: T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive.

• TBolliger subscribed.Nov 17 2017, 11:42 PM

Change 392176 had a related patch set uploaded (by Niharika29; owner: Niharika29):
[mediawiki/extensions/CentralAuth@master] Turn lowercase initial letter in usernames to uppercase before querying

https://gerrit.wikimedia.org/r/392176

gerritbot added a project: Patch-For-Review.Nov 18 2017, 12:24 AM

If this is done, it should be done everywhere relevant and not just in globalallusers. The only place in core that I know of is ApiQueryAllUsers.

As for whether this should be done, I see arguments either way. None seem particularly compelling to me, it comes down to DWIM versus breaking weirdly in some rare edge cases.

For:

It's more likely to be DWIM.
We do do this sort of normalization for title parts in modules like ApiQueryAllPages, despite the possibility of the "Against" issues there. OTOH, it could as well be argued that those are buggy too.
We did it for user names in ApiQueryAllUsers from 2007 until 2012, when it was lost in the fix for T35602. Again, though, it could be argued that that was a bug fix.

Against:

There's the possibility of strange behavior in the cases where these "invalid" names with lowercase first letters are in the database. Besides bugs, that can happen when the Unicode version is updated, as for example the case of user ɱ.
The values being considered here are positions in the space of all possible names, not necessarily valid names themselves. Particularly as the end point of a range, a lowercase letter may make more sense than having to figure out the last possible name beginning with the previous character (e.g. "a" instead of "` followed by 63 U+10FFFF followed by U+07FF", assuming we never increase the 255-byte hard limit on usernames).
Uppercasing in MediaWiki varies by language, e.g. uppercase of i is I for most languages, but for Kazakh, Azerbaijani, Karakalpak, an Turkish it's İ instead. That could be unexpected, or completely expected, depending. Especially on multilingual sites where it's going to use the content language rather than the user language.
- Note the sorting is always in Unicode order, not by any localized collation. That's a restriction from the database layer.

Anomie moved this task from Unsorted to Needs details or plan on the MediaWiki-Action-API board.Nov 20 2017, 4:05 PM

As a compromise, we could have a flag to disable the behavior. But I think given T180084, the expected behavior is DWIM and I don't think each client should have to re-implement the "proper" behavior.

In T180858#3775027, @dbarratt wrote:

As a compromise, we could have a flag to disable the behavior. But I think given T180084, the expected behavior is DWIM and I don't think each client should have to re-implement the "proper" behavior.

From my product manager I agree. But if this ticket is declined, I would greatly appreciate guidance on how to emulate this functionality for our user-facing products.

Does anyone have any suggestions on how to resolve this issue for T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive?

• TBolliger mentioned this in T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive.Nov 27 2017, 9:04 PM

I think what you really want here is a new API parameter for limiting usernames by a prefix. If I type in AAA as a username and get BBB as an autosuggest result, I would probably be suprised. You can discard mismatching results on the client side, but 1) as you said each client shouldn't reimplement the proper behavior, 2) it's nontrivial due to the language issues mentioned in T180858#3774735 (if I type in iaz and get back İazak, that's actually a valid result in a Turkish language context, but the client would probably mistakenly throw it away). It's better to let the API figure out what the correct results are.

Also, that way aufrom/gaufrom would stay as a continuation parameter that accepts any value and thus avoids the invalid username corner cases Anomie mentioned.

Who would be the most appropriate person to build such an API, and what is the best way to get it on their radar?

ping @dmaza and @kaldari

In T180858#3798527, @Tgr wrote:

I think what you really want here is a new API parameter for limiting usernames by a prefix.

aguprefix already exists.

In T180858#3800832, @Anomie wrote:

aguprefix already exists.

So maybe only canonize that and declare agufrom/aguto to be continuation paramters and leave them as is? A continuation parameter uppercasing can be really bad (in the worst case it can send clients in an inifinite loop, given that uppercase characters can have lower Unicode code points than their lowercase versions), and even if a new continuation parameter were introduced, as you recommend on the patch, existing clients use agufrom/aguto for paging already. OTOH the prefix not working right for invalid usernames does not seem like a big deal.

Anything using continuation properly is going to use whatever the module returns, so it would only be clients doing manual paging that would be affected.

I don't think that's the case. Try something like https://meta.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=globalallusers&agufrom=%60%EF%B9%8F%E7%89%99%E6%80%A1 , the response will have

"continue": {
    "agufrom": "apfeldieb",
    "continue": "-||"
}

If the client feeds that back faithfully, and the API canonizes it, it will loop back to Apfeldieb which is probably a few million users earlier.

Or did you mean that clients properly handling continuation will use the new parameter? That's true, but I wouldn't be surprised if there were still pre-new-continuation-style clients around.

I mean that if the code is changed to use a new continuation parameter instead of agufrom, then the response will have

"continue": {
    "agucontinue": "apfeldieb",
    "continue": "-||"
}

Any client that isn't totally broken with respect to continuation would feed that back faithfully and things would just work. Any client that somehow manages to use agufrom instead is already broken.

And it works the same way for the old-style continuation. they'd start seeing

"query-continue": {
    "globalallusers": {
        "agucontinue": "apfeldieb"
    }
},

and would therefore feed agucontinue back just as new-style continuation users would.

In T180858#3798585, @TBolliger wrote:

Who would be the most appropriate person to build such an API, and what is the best way to get it on their radar?

What's being suggested is adding this functionality (adding a new param) to the existing API. I have a patch but I submitted that when this was supposedly a one-liner task. Would @dbarratt be interested in taking over the patch and making the changes being suggested?

Huji added a project: InteractionTimeline.Dec 19 2017, 2:36 PM

• TBolliger removed a parent task: T180084: Interaction Timeline V1: The first character of usernames should not be case sensitive.Feb 1 2018, 9:29 PM

dbarratt updated the task description. (Show Details)Feb 14 2018, 8:43 PM

• TBolliger moved this task from Backlog to Defects on the InteractionTimeline board.Mar 1 2018, 5:02 PM

• TBolliger moved this task from Untriaged to Snackbox on the Anti-Harassment board.Mar 9 2018, 1:50 PM

• TBolliger moved this task from Snackbox to Product/Tech backlog on the Anti-Harassment board.Apr 13 2018, 6:09 PM

• TBolliger moved this task from Product/Tech backlog to Tracking work by others on the Anti-Harassment board.Apr 13 2018, 6:14 PM

• TBolliger removed a project: Anti-Harassment.Jan 30 2019, 10:59 PM

So are all these from wiktionary back when lowercase first letter usernames were allowed? If not, where are these coming from (Just curious)

Perhaps the real solution should be to rename all these users so that they actually start with an uppercase letter like they should.

Aklapper removed subscribers: Anomie, • TBolliger.Oct 16 2020, 5:01 PM

Restricted Application added a project: Platform Engineering. · View Herald TranscriptOct 16 2020, 5:01 PM

• AMooney removed a project: Platform Engineering.Oct 20 2020, 7:26 PM

Pppery removed a project: Patch-For-Review.Mar 31 2023, 12:30 AM

Aklapper triaged this task as Low priority.Oct 9 2023, 12:47 AM

Aklapper changed the subtype of this task from "Task" to "Feature Request".

CentralAuth API list=globalallusers should capitalize the first letterOpen, LowPublicFeatureActions

Description

Details

Related Objects

Event Timeline

CentralAuth API list=globalallusers should capitalize the first letter
Open, LowPublicFeature
Actions