Page MenuHomePhabricator

Remove auto-fill/suggest of usernames from password reset forms
Closed, ResolvedPublic

Description

Per the security@ email from 11/20/2018, usernames are auto-filled/suggested within the Username form field on various Mediawiki installs, e.g. https://en.m.wikipedia.org/wiki/Special:PasswordReset. For public-facing wikis, username by itself is considered to be public data and searchable in any number of ways (api, specialpages, quarry, etc.) However, we should discourage any potential abuse (and confusion from regular users) by removing disabling this feature on Wikimedia project wikis from the password reset form, an obvious target of nefarious users. There is also at least a moderate performance issue here, querying the db on each keystroke as noted by @jcrespo.

Event Timeline

sbassett updated the task description. (Show Details)

This feels like addition of obscurity to me.

Is that a bad thing? Increasing obscurity and potentially deterring certain behaviors while reassuring legitimate users seems like a good thing.

Is that a bad thing?

Well you are proposing removal of functionality that's only displaying public data. This does need to be balanced against the value of that functionality.

Increasing obscurity [...] seems like a good thing.

In of itself no.

potentially deterring certain behaviors [...] seems like a good thing.

What behaviors would you actually be deterring here? Sending password reset emails to the wrong user to annoy them?

reassuring legitimate users

Would this actually reassure anyone or just provide a false sense of security?

So far I'm more convinced about removing all username autocompletion fields from MediaWiki altogether, for @jcrespo's comment (but I'm not really convinced about that either - this should be an indexed lookup).

It appears this feature was recently added in rMW0120f5257653: Special:PasswordReset: Make the user field a user lookahead field, not just text. CCing people who were involved in that patch. I don't see a linked task.

As for performance, I note we have the same behavior on several other pages, including Special:ListFiles, Special:DeletedContributions, Special:Block, Special:Emailuser, Special:ActiveUsers, Special:BlockList, and Special:NewPages. The widget works by hitting the API with a request like https://en.wikipedia.org/w/api.php?action=query&format=json&list=allusers&auprefix=A&aulimit=10, and seems to have logic for delays and such.

Well you are proposing removal of functionality that's only displaying public data. This does need to be balanced against the value of that functionality.

Some clarification: this was discussed at the Security-Team group meeting today and we collectively decided to file a low-priority bug after receiving and discussing the aforementioned security@ email. (As opposed to me going rogue and filing non-issues.)

Increasing obscurity [...] seems like a good thing.

In of itself no.

We may have to agree to disagree here.

What behaviors would you actually be deterring here? Sending password reset emails to the wrong user to annoy them?

Sure. Another would be easy information disclosure at an endpoint where we've seen abuse, and recently. IMO this feels a bit like a solution for a broken-window theory problem, hence the low priority and public visibility. While I can only speak for myself, I'd assume the rest of the Security-Team would be open to considering all counterpoints on this matter, as they relate to the status of this bug.

Well you are proposing removal of functionality that's only displaying public data. This does need to be balanced against the value of that functionality.

Some clarification: this was discussed at the Security-Team group meeting today and we collectively decided to file a low-priority bug after receiving and discussing the aforementioned security@ email. (As opposed to me going rogue and filing non-issues.)

I didn't think that's what this was, fwiw. :) I don't mind whether this is coming from you or from the foundation security team as a group or anyone else. My thing is just that the task as it stands is unconvincing.

I just tested on a private wiki with MediaWiki 1.33-alpha – with $wgGroupPermissions['*']['read'] = false; – the text field is blinking but nothing is showing because the API returns an error readapidenied. Hence it is slightly disappointing to type and see the text field blinking, but nothing else.

Possibly it would be better to check whether the wiki is public or private when the form is created.

About possible abuse, perhaps it can be monitored on the long term if there are more 'forgotten passwords' requests than before this feature was introduced and/or if some users (e.g. User:A on English Wikipedia if I type "A" and Enter) complain about receiving too much (false positive) such requests.

I can think that this would be a problem if Special:ListUsers was itself restricted (using Extension:Lockdown), perhaps because usernames are considered Personal Identifiable Information.

However, this feature is simply using a public endpoint: api.php?action=query&format=json&list=allusers&auprefix=A&aulimit=10 so it's that query what in such case should be disabled.

As mentioned by @Seb35, the UI should gracefully degrade if the api is not reachable, but that's not a security issue.

I disagree with the premise of the description that it should be "disabled from Wikimedia project wikis from the password reset form, an obvious target of nefarious users". I'm not sure if the target of nefarious users are the Wikimedia wikis or the password reset form, but I don't think it should be disabled there:

  • We proudly list the wiki users, so it's not an information leak.
  • It is actually a helpful addition for user, so when John comes to recover the password for the account he registered a few years ago, he will be shown that there are several users with similar names: John, John 03, John 04412… and when seeing it, he will actually be more likely to remember that he used a slightly different username, rather than spamming [[User:John]] with a spurious request.

Note that there's a Community Tech wishlist proposal right now to require entering both an account's name and e-mail address to trigger a reset, to reduce the scope for drive-by "hack" attempts that trigger concerns.

@Jdforrester-WMF The Security-Team discussed that item today as well, and perhaps filing it as a separate task related to this issue. Given some of the feedback above, it might be wiser to pursue that approach as opposed to this one.

I mentioned that this is a performance and usability issue, not a security one. If you want to reset a password, it should not autocomplete with all users- even good intended ones could type other account by accident. My main concern is performance-

and seems to have logic for delays and such.

A brief scan of the autocompletion seemed the opposite- a suggestion was possible by typing just 1 key or it was possible to get a suggetion with the empty string. At least 3 characters should be typed before autocompletion kicks in. I don't think autocompletion "complete as you write" should ever run against the main metadata MySQL servers (for users and for other fields) - it is ok to have dedicated, lower redundancy ones for that, as this caused issues on Wikidata in the past when queries stop being "fast" for any reason, affecting other core functionality.

I mentioned that this is a performance and usability issue, not a security one. If you want to reset a password, it should not autocomplete with all users- even good intended ones could type other account by accident. My main concern is performance-

and seems to have logic for delays and such.

A brief scan of the autocompletion seemed the opposite- a suggestion was possible by typing just 1 key or it was possible to get a suggetion with the empty string. At least 3 characters should be typed before autocompletion kicks in. I don't think autocompletion "complete as you write" should ever run against the main metadata MySQL servers (for users and for other fields) - it is ok to have dedicated, lower redundancy ones for that, as this caused issues on Wikidata in the past when queries stop being "fast" for any reason, affecting other core functionality.

Would varnish caching the api response alliviate this concern? (Particularly for when users have not written very many letters, the cache hit ratio would be high)

Note that there's a Community Tech wishlist proposal right now to require entering both an account's name and e-mail address to trigger a reset, to reduce the scope for drive-by "hack" attempts that trigger concerns.

@Jdforrester-WMF The Security-Team discussed that item today as well, and perhaps filing it as a separate task related to this issue. Given some of the feedback above, it might be wiser to pursue that approach as opposed to this one.

@sbassett: In the interest of avoiding potential confusion, note that wishlist proposal is for the creation of a user preference to apply that restriction on a per-account basis, not for such a restriction to apply globally. The intent is that people bothered by too many spurious reset requests could enable the preference.

I don't think autocompletion "complete as you write" should ever run against the main metadata MySQL servers (for users and for other fields) - it is ok to have dedicated, lower redundancy ones for that, as this caused issues on Wikidata in the past when queries stop being "fast" for any reason, affecting other core functionality.

Note that most action API read query modules,[1] including that used by this username completion widget, should be using the 'api' group rather than the "main metadata" group.

Note that, besides this username completion widget, we also have widgets for completing page titles more generally and we offer an API endpoint for browser search bars to query using the OpenSearch Suggestions protocol. It seems far too late to try to forbid autocompletion functionality.

[1]: ApiBase::getDB() defaults to this. Exceptions include ApiQueryUserContribs (uses the 'contributions' group) and ApiQueryWatchlist (uses the 'watchlist' group). It's also possible that API modules that use generic backend code might wind up using the backend's group.

Would varnish caching the api response alliviate this concern?

Of course varnish responding to this request would be a non-issue, there are restrictions there to prevent abuse and it would only affect the outer layers, and not the application or database. I don't think that is a viable method due to how mediawiki is architectured at the moment, and the specifics of the query which would be something like 'api.php?suggestme='random string', which I don't think that is cachable effectively (meaning that even if it was cachable, the hit ratio would be very low). But those are pure suppositions I do (honestly, based on my understanding of how wiki search works) and, assuming they were wrong, of course I wouldn't have any issue if it was a very rare query.

Note that, besides this username completion widget, we also have widgets for completing page titles more generally

And those also use directly medatada databases, with no caching in-between? https://knowyourmeme.com/photos/199693-mother-of-god

It seems far too late to try to forbid autocompletion functionality.

I am NOT suggesting to remove the functionality (well, I do for the password reset form for usability reasons, but I do think it should be there for the "block user" and others). More autocompletion (where reasonable) == better. What I am asking is to have a proper "non application-deep" backend to ensure high availability and proper resourcing, given the amount of users we support. For example, I was convinced that title auto-completion was done with Elasticsearch, not the database (an example of offloading specialized behavior to separate service to not compromise core resources). I am not a microservices fan, but you are forcing me to convert into one :-)

A few divergent points have spawned here :)

In regards to the sort-of main point from @sbassett of the undesirability of the auto-complete feature for usernames on the password reset form for which we require no other fields, I would agree intuitively this is an invitation for abuse or user error.

For context, this originally became a point of discussion I believe because of some larger security issues at play where folks are more consistently forwarding password reset emails for which they did not make the request. There are quite a few, and of those I have looked at none seem outright suspicious beyond the maybe obvious line of reasoning "Is the current mechanism lending itself easily to users submitting for other than themselves unwittingly?".

The answer would seem to be yes from our recent experiences, and since this is a relatively recent phenomenon (July 9th) it seems we should be asking ourselves if there is a reason to implement along with asking ourselves if there is a reason to keep. As there is no task associated with the commit to track back on, we are now debating the value of the change itself in retrospect.

Edit: I realized one of the value propositions discussed is not mentioned here, what is the usefulness of user name auto population in a field where you should only ever be inputting your own username?

Sorry for derailing the original report, my 0.02€, if technically possible, I would like to propose to disable user autocompletion on the reset password form only for the reasons Chase mention.

@Jdforrester-WMF can you help us understand the reasons for the change itself?

@Jdforrester-WMF can you help us understand the reasons for the change itself?

The change happened because James was helping me with a password reset and we were both surprised at the lack of username autocomplete since it does exist on other similar pages, which Anomie pointed out above.

@Jdforrester-WMF can you help us understand the reasons for the change itself?

The change happened because James was helping me with a password reset and we were both surprised at the lack of username autocomplete since it does exist on other similar pages, which Anomie pointed out above.

Thanks @Niharika that helps a lot

At the moment with the discussion on task here and the adhoc nature of the change I would like to revert and create a task to discuss deploying this feature as opposed to the reverse. It is my understanding that username autocompletion may not be an anti-pattern for password reset forms, but in our case it seems it may be especially problematic as we do not require also knowing the email associated with each username to issue password reset requests. Which as far as I can tell is unusual.

This is not changed by the public nature of usernames in other places to my knowledge as the case being made here is not that usernames are a sensitive field, but rather that knowledge of a username alone being enough to issue password resets makes autofill undesirable from a practical standpoint. I personally believe both username and email of record should be required, but that's for the next task I expect.

Additionally I would like to walk this thought out in a task from above:

Edit: I realized one of the value propositions discussed is not mentioned here, what is the usefulness of user name auto population in a field where you should only ever be inputting your own username?

@Niharika does that seem reasonable to you?

@chasemp Sounds good to me.

One clarifying point on -

Edit: I realized one of the value propositions discussed is not mentioned here, what is the usefulness of user name auto population in a field where you should only ever be inputting your own username?

The reason James was helping me with the reset is because apparently one cannot request a password reset if the IP address is blocked (is that a bug?). Since the WMF office IP address is blocked (to prevent staff from anonymously editing) and I forgot my password, the only way to do the reset is to ask someone logged in to trigger a password reset for my user account. So while theoretically one should only ever be inputting their own username in that field, that might not always be the case in practice.

The reason James was helping me with the reset is because apparently one cannot request a password reset if the IP address is blocked (is that a bug?).

A task exists at T109909: Anon-only range block doesn't allow password resets. See T109909#2934298 in particular for why this behavior exists.

Change 475798 had a related patch set uploaded (by SBassett; owner: SBassett):
[mediawiki/core@master] Revert Special:PasswordReset: Make the user field a user lookahead field

https://gerrit.wikimedia.org/r/475798

Most of the justifications here don't sound halfway convincing.

  • Autosuggest uses the public API to list users. Anything that can be done with the autosuggest field can be done without it. So the info leak angle does not make sense.
  • The loading animation showing up for a fraction of a second when you type is IMO a non-issue; it does not look particularly confusing or annoying, it is a barely visible background animation. The autosuggest logic could be improved to not retry but it's not really worth the effort. Anyway that has nothing to do with the removal of autosuggest on public wikis.
  • Clearly people who want to harass other people won't rely on autosuggest to do that, so in terms of people getting annoyed by unasked password resets, the question is, does the autofill make it easier to accidentally type in a wrong name? It's not convincing to me that it would.
  • The performance impact has got to be minimal. There are about a hundred password resets an hour, so maybe a thousand type events. There are about 4000 action API requests per second, many of which use the DB (some have a memcached layer in between but I doubt that happens for a huge chunk of the requests) and the username list API is as light as you can get (it is fully covered by a single unique index).

So let's focus on the UX issue. There are three ways (not necessarily exclusive) to try to support the user:

  • Use autosuggest (status quo). Probably helpful for long usernames; probably not that helpful for average usernames on large wikis because there will be lots of hits with similar names when you have tens of thousands of users. Probably helpful for small wikis with just a few thousand users.
  • Use the username cookie (like on the login page). Probably the best option for single-user computers.
  • Let the browser do its own autosuggest. In theory this would be pretty nice if the input field would be similar enough to the login field for the browser to share form history, so you the autocomplete would use the list of users who have logged in on this computer, but that doesn't happen currently. An autosuggest based on previous password resets does not seem useful (how often do you do a reset?).

So IMO the API-based autosuggest is the least good option but still better than nothing and there is no cause for revert. Ideally we should change the field name so that it shows the list of previous logins. (Is it bad that you see who logged in from the machine before? Maybe, but the login page does it too, so that's not an issue with password reset.) If that doesn't work out, we could just use the username cookie.

I don't think [varnish caching] is a viable method due to how mediawiki is architectured at the moment, and the specifics of the query which would be something like 'api.php?suggestme='random string', which I don't think that is cachable effectively (meaning that even if it was cachable, the hit ratio would be very low).

I don't think that has anything to do with MediaWiki architecture, form field autosuggest is just not something you can reasonably cache due to the exponential blowup of possible URLs. On tiny wikis you could query the full username list and do the search on the client side, but that does not work with any reasonable client size.

What I am asking is to have a proper "non application-deep" backend to ensure high availability and proper resourcing, given the amount of users we support.

Sounds like a good candidate for an RfC :) Mentioning it in tangentially related tasks is unlikely to achieve much.

For example, I was convinced that title auto-completion was done with Elasticsearch, not the database

It usually is. (Maybe always? It is when you use the prefixsearch API, and I'm not aware of any title autosuggest not using that.) OTOH I think the fancy-looking version then does a DB lookup anyway to get page images and Wikidata description for the results (although that might be cached).

It's mainly done because the results are identified by fuzzy string match and sorted by complex weighting involving document similarity and number of links to the page and whatnot, so that you get pages that are more likely to be relevant for you. There is no obvious way to sort users so it does not seem worth the effort of maintaining a separate ES index for them (IMO, but I'm not involved in search). If we had a proper discussion system with @-mentions it could be worth adding ES-based autocomplete for that, with weights based on how active the user is, for example. Maybe five years from now :)

I am not a microservices fan, but you are forcing me to convert into one :-)

I am not sure what a microservice could achieve here; it would have to use the same database and that's probably the bottleneck, not PHP. If there was a special slave DB or cache layer or whatever to use instead the normal DB, using it from MediaWiki would be just as good and a lot easier than creating a dedicated service for it.

So IMO the API-based autosuggest is the least good option but still better than nothing and there is no cause for revert. Ideally we should change the field name so that it shows the list of previous logins. (Is it bad that you see who logged in from the machine before? Maybe, but the login page does it too, so that's not an issue with password reset.) If that doesn't work out, we could just use the username cookie.

Another thing we could do is use the username from the failed login attempt (usually there are some failed login attempts before password reset). I'm not sure about the privacy aspects though; we would have to preserve the last username in the login page HTML (form history does that too, but that's obvious and easy to disable on either server or client side).

Change 478394 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Rely on browser autocomplete for Special:PasswordReset

https://gerrit.wikimedia.org/r/478394

Change 478395 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Use username from last successful login in Special:PasswordReset

https://gerrit.wikimedia.org/r/478395

If we're looking to abandon https://gerrit.wikimedia.org/r/475798/, my vote would be to instead go with https://gerrit.wikimedia.org/r/478395/, as it does a good job of balancing both security and usability IMO.

If we're looking to abandon https://gerrit.wikimedia.org/r/475798/, my vote would be to instead go with https://gerrit.wikimedia.org/r/478395/, as it does a good job of balancing both security and usability IMO.

Anecdotally, We have seen a lot fewer unsolicted forwarded password reset emails in the past...month? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/478395 seems like a sensible compromise to me here, and we can revisit if it's warranted.

Change 475798 abandoned by SBassett:
Revert Special:PasswordReset: Make the user field a user lookahead field

Reason:
Preferred alternatives: I4ae492230 and my revert at Ic3cf69181

https://gerrit.wikimedia.org/r/475798

Seems like consensus is that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/478395 is the sanest outcome here. @Tgr anything blocking?

No, someone just has to merge it.

Change 478395 merged by jenkins-bot:
[mediawiki/core@master] Use username from last successful login in Special:PasswordReset

https://gerrit.wikimedia.org/r/478395

Change 478394 abandoned by Gergő Tisza:
Rely on browser autocomplete for Special:PasswordReset

Reason:
Done in a different way in I248092b14.

https://gerrit.wikimedia.org/r/478394

Tgr claimed this task.

Replaced with username of last logged-in user (which we store in a cookie for the login page). Opened T211948: Login page form history might leak sensitive information about previous users for more general discussion of login page username privacy.