Page MenuHomePhabricator

URL encode user name in Similarusers service request URL
Closed, ResolvedPublic1 Estimated Story Points

Description

SimilarEditorsClient should URL encode user names when building the URL for the request to the Similarusers service.

Relevant line: https://gerrit.wikimedia.org/g/mediawiki/extensions/SimilarEditors/+/c70e4dfbf2c4c99e54606ee732df65fc69201123/src/SimilarEditorsClient.php#58

Related error: T307023#7957492 - no. 4 has spaces in the user name

Event Timeline

Change 802207 had a related patch set uploaded (by AGueyte; author: AGueyte):

[mediawiki/extensions/SimilarEditors@master] URLEncode username in SimilarEditors

https://gerrit.wikimedia.org/r/802207

Change 802207 merged by jenkins-bot:

[mediawiki/extensions/SimilarEditors@master] URLEncode username in SimilarEditors

https://gerrit.wikimedia.org/r/802207

dom_walden subscribed.

I tested for two risks:

  • SimilarEditors extension sends invalid URLs.
  • URL encoding the username means the similarusers service looks up a different username than the one intended.

I checked out two versions of the SimilarEditors extension, before and after this changed. Using a script, I submitted the SimilarEditors form for every user and IP that the similarusers service knows about in its test data for both versions. I then compared the number of results returned before and after.

There were 3 scenarios I was interested in:

One

Usernames which before returned 0 results but now return > 0 results.

This suggests that now the request is being sent successfully where before it was failing because the URL was invalid. I confirmed this for a few usernames by running the request myself and monitoring the logs. I didn't do this for all usernames as there were too many.

This is no guarantee that the similarusers service is looking up the intended username or that the results for the username are correct. I don't have an oracle for this (e.g. previous results for that username for comparison).

I also could not find any invalid URL errors in the MediaWiki debug logs.

Two

Usernames which before returned 0 results and now also returns 0.

This suggests that the request is now being sent successfully but the similarusers service returns an error. I confirmed this for a few usernames (but not all of them) by reading the logs.

It may also be the case that the request was being sent successfully before but returned an error. This might be OK if the errors we are getting before and after are the same. I confirmed this for a few usernames. However, it should be noted that it is common for two different usernames to return the same error, so there is no guarantee that that intended username is being looked up.

It should be noted for the above two scenarios that the similarusers service can intermittently time out (e.g. due to it being under too much load) and so can occasionally return 0 results where normally it might return > 0.

Three

Checked that all other usernames return the same number of results (> 0) before and after.

This suggests that urlencoding the username has not changed the username that the similarusers service is looking up.

Although unlikely, chance might have it that the same number of results are returned before and after but the results are different. To guard against this risk, I sent URL encoded requests directly to the similarusers service to see how it decoded the usernames.
Results:

RequestDecoded
JumP%21erreJumP!erre
Foo%2BbarFoo+bar
Vincent_Lextrait, Vincent%20Lextrait, Vincent+LextraitVincent_Lextrait

which seems correct to me.

Test environment: local docker Similar Editors 0.0.0 (c9fc014) 21:27, 7 June 2022.

Niharika subscribed.

Thanks for the thorough testing, Dom.