Page MenuHomePhabricator

Improve security for Special:Userlogin (tracking)
Closed, InvalidPublic

Description

Author: cannon.danielc

Description:
In light of the recent compromised accounts on the English Wikipedia, I'd like
to propose a few improvements in the way of securing log-ins to MediaWiki.

It is my firm belief that both of these compromised accounts were the result of
simplistic password-cracking: In the one case it appears that the user's
username was the same as his password, in the other it appears that the user's
password was "password". As such my first recommendation is that user's be
required to select a password containing at least 6-8 characters, comprised of
at least one digit and both capital and lowercase alpabetic characters.
Basically, this is just to force users to select stronger passwords.

Secondly, I would like to suggest a log-in captcha at Special:Userlogin. After
one failed attempt, the user must also complete the captcha to log-in. This will
prevent automated password-crackers from being used to get user's passwords and
will make it much more difficult and time-consuming for others to manually guess
passwords.

I would also like to propose that the highly unsecure log-in method provided by
Api.php be removed. This uses a simple GET with the user's username and password
in the URL, and absolutely no throttling whatsoever. Clearly, this is a high
security risk.

If the captcha idea is rejected, or even if it is accepted, I would like to
suggest that a throttle on log-in attempts be implemented, such that after
X-number of tries to authenticate from a host, regardless of the username, that
host must wait 30 seconds before being allowed to try again. This will
additionally curb the problem of both automated and manual password crackers.

With the millions of users of MediaWiki, it's time that we started to get
serious about security issues, especially on Wikimedia. Most other prominent
sites have realized this; it's time we do too. At present time, any idiot who
knows any programming at all can set up a script to use the monkey-on-a-keyboard
approach to guess any password; this is simply unacceptable. Even iff my ideas
are rejected, I do hope that _something_ will be done to improve security.


Version: 1.11.x
Severity: normal
URL: http://en.wikipedia.org/Special:Userlogin

Details

Reference
bz9816

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
InvalidNone
DeclinedTgr
ResolvedNone
ResolvedNone
ResolvedRyanLane
ResolvedNone
ResolvedReedy
ResolvedNone
ResolvedNone
Resolved Catrope
ResolvedNone
ResolvedNone
InvalidNone
ResolvedNone
DeclinedNone
ResolvedNone
Resolved demon
Resolvedkostajh
OpenNone
OpenNone
ResolvedNone
OpenNone
ResolvedShivanshbindal9
ResolvedShivanshbindal9
OpenNone
ResolvedShivanshbindal9
ResolvedShivanshbindal9
ResolvedShivanshbindal9
OpenNone
DuplicateNone
Resolved JMinor
OpenBUG REPORTNone
StalledNone
Resolvedkaldari
ResolvedNiharika
ResolvedNiharika
ResolvedBawolff
ResolvedBawolff
DeclinedNone
Resolvedkaldari
ResolvedNiharika
ResolvedMusikAnimal
ResolvedNiharika
ResolvedNiharika
ResolvedReedy
ResolvedMaxSem
ResolvedJohan
OpenNone
Resolved csteipp
ResolvedNone
ResolvedAnomie
ResolvedJoe
ResolvedJoe
Resolvedhashar
Resolvedbd808
ResolvedAnomie
ResolvedKrinkle
ResolvedNone
ResolvedJanZerebecki
ResolvedKrinkle
ResolvedTgr
DeclinedNone
Resolvedmatmarex
Resolvedmatmarex
ResolvedNone
ResolvedNone
ResolvedTgr
DeclinedNone
Resolvedsbassett
Resolvedsbassett
ResolvedTgr
ResolvedAndrew

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:41 PM
bzimport set Reference to bz9816.
bzimport added a subscriber: Unknown Object (MLST).

intelligent.nerd wrote:

Captchas might not be such a good idea, it makes it very hard on those that
cannot see the captcha. But the other ideas of throttling are very good. Please fix.

ultrablue wrote:

While we're at it, can we get better captchas than simple sums that even a bot
could read? We could have a captcha competition, with people submitting the best
captcha images. Okay maybe not a competition. Just not the crappy ones most sites
have that are actually completely unreadable.

We have better captchas, that use python libraries, but they generate more
overhead and are turned off.

cannon.danielc wrote:

(In reply to comment #1)

Captchas might not be such a good idea, it makes it very hard on those that
cannot see the captcha. But the other ideas of throttling are very good.

Please fix.

Well, a lot of sites provide the captchas in both audio and visual form so that
the blind can use them as well, and we certainly don't have to use the illegible
ones that most sites use. I find the ones that, say, Google uses to be quite
legible though. Captchas also do not have to create overhead for the devs, as
there are many captcha libraries available for free that can be easily
incorporated into MediaWiki.

titoxd.wikimedia wrote:

Additionally, encrypt the passwords, and use HTTPS. I know, the secure server
needs more resources, but at least some sort of encryption can be done to stop
passwords from being intercepted.

cannon.danielc wrote:

(In reply to comment #5)

Additionally, encrypt the passwords, and use HTTPS. I know, the secure server
needs more resources, but at least some sort of encryption can be done to stop
passwords from being intercepted.

And, from what a little birdy told me, the secure server uses a null cipher for
performance reasons, so you're really not getting any added security if this is
the case, other than the illusion of security provided by the little padlock in
your status bar :). I don't see why we couldn't at least use some kind of MD5 or
PGP encryption on transmissions though. Granted the MD5 encryption has now been
successfully reversed, but it would help quite a bit. I think encryption of
posts is probably lower priority than the above mentioned issues though, and it
would require quite a bit more resource.

cannon.danielc wrote:

(In reply to comment #6)

And, from what a little birdy told me, the secure server uses a null cipher for
performance reasons, so you're really not getting any added security if this is
the case, other than the illusion of security provided by the little padlock in
your status bar :).

Update: It appears this was a lie. I heard this at
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28technical%29&direction=next&oldid=96637542#Status_of_secure.wikimedia.org.3F
but it turns out to be false. Having just checked, it appears the secure server
does use an actual MD5 hash, so logging in via it is likely to be fairly secure.
It would, therefore, definitely improve security to handle all log-in requests
through the secure server, though I understand that this may be quite
resource-intensive.

anonemouse0 wrote:

Throttle login attempts, please. I mistype my username and password fairly
often, but 5 attempts in, say, 5 minutes, should be enough for anyone. Don't do
captchas, they're a nuisance.

(In reply to comment #8)

Throttle login attempts, please. I mistype my username and password fairly
often, but 5 attempts in, say, 5 minutes, should be enough for anyone. Don't do
captchas, they're a nuisance.

Captchas would be useful along with throttling, only appearing when you have
entered the wrong password several times.

ultrablue wrote:

Without captchas, people could get around IP throttling by just using a lot of
tor nodes/open proxies simultaneously. The throttling would have to be per-
account as well as per-IP.

lucasbfr.bugzillaMediawiki wrote:

An other idea would be to require a Lost password retrieval procedure after a
given number of unsuccessful tries in a short time. The bad side is that if you
don't have any e-mail address filled in you can't get your access back, but on
the other hand that would solve the problem without requiring captchas.

Security is nice, but it shouldn't make logging in impossible for legitimate
users even in cases of flood.

robchur wrote:

(In reply to comment #11)

An other idea would be to require a Lost password retrieval procedure after a
given number of unsuccessful tries in a short time. The bad side is that if you
don't have any e-mail address filled in you can't get your access back, but on
the other hand that would solve the problem without requiring captchas.

This would introduce a straightforward vector for abuse.

phi1ipp wrote:

I oppose the captcha

I support a delay of between 3 and 15 minutes after three bad attempts from a
given IP - this should apply foremost to the IP that attempted login; Max
Semenik has pointed out that applying it to the account for which login was
attempted is a bad thing because it can prevent genuine logins. However, this
could be additionally enacted when there is good evidence of IP-hopping (at
least x IPs attempted logins to a given account, and failed at least y times).

The password strength test that has been proposed should include a dictionary
search for each part-string as well as common letter-number substitutions,
specifically O-0, I-1, A-4, T-7, S-5, Z-2, E-3 and also Z-7; this list may be
incomplete.

ben_aveling wrote:

Of course, we don't want throttling at a level that will allow a DOS.

cannon.danielc wrote:

(In reply to comment #15)

Of course, we don't want throttling at a level that will allow a DOS.

Clearly. For this reason, I suggest that the throttling be targeted toward hosts
and not usernames, as that's just asking for trouble. Additionally, I would
suggest the throttling not be done in a matter that requires WM to maintain a
connection with the host; rather, it should simply remember the time of the last
failed log-in attempt from a given host and reject any log-in attempts within,
say, 10-15 seconds of that last log-in attempt. If the purpose of the throttling
is to prevent brute-forcing passwords, that should be sufficient.

To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a
captcha? To me, this seems like the most logical and low resource-intensive
method way of curbing brute-force cracking, and it is the method that most sites
have adapted. It is not like the captcha would kick-in everytime you log in, but
rather only if you fail to successfully log-in from a given host, say, 3 times.
Again the captcha would be targeted to the host, not to the username.

About using HTTPS: it would be great if we could login via the secure server,
but still use the faster non-HTTPS one for everything else (the secure server
could return a single-use token in a redirect to the normal servers; the normal
servers would then check the token, invalidate it, and copy the session data
from the secure server). This way a sniffer would only be able to hijack the
session, but not the password.

ayg wrote:

Repurposing to tracking bug, there's too much going on here. Please open new
bugs to discuss specific requests not already covered elsewhere.

Might be good to have a limit on failed logins per hour.

phi1ipp wrote:

To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a
captcha?

It's redundant with the delay mechanism. Additionally, my experience of captchas
from a variety of other websites is that they rarely work. They also exclude
some visually impaired users. If an exception is made for those users, the whole
purpose is defeated.

ayg wrote:

(In reply to comment #20)

To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a
captcha?

It's redundant with the delay mechanism. Additionally, my experience of captchas
from a variety of other websites is that they rarely work. They also exclude
some visually impaired users. If an exception is made for those users, the whole
purpose is defeated.

No, because a) the group of targets is vastly smaller and b) they could be
subjected to stricter password requirements. No delay mechanism is planned for
addition because of DoS concerns (lock out all admins/users by spamming random
passwords). Per-IP delay is still a concern, albeit smaller, due to dynamic
IPs. Anyway, please keep discussion on specific features to specific bugs
(e.g., bug 9836), not the tracker bug.

armedblowfish wrote:

Thanks for the security clarification, Daniel Cannon. I also heard that
secure.wikimedia.org used null encryption.

TLS (successor to SSL) for login-only would be great, but there are some people,
hopefully a minority, who will want to use the secure server for everything.
For example, people who use a hardblocked proxy through http but not https, or
people who are just paranoid. It would probably be better to implement a
separate TLS login for the regular site. However, the developers know more
about what kind of load the servers can handle than I do.

Once TLS login is required, it might not be a bad idea to require everyone to
change their passwords.

Too much security could introduce the possibility of denial of service attacks,
where a cracker makes it impossible for the owner of the account to access the
account. Requiring a user to request a new password after a certain number of
failed tries would make it easy to DOS attack users without email addresses.
Same thing for captchas, as some users may be either blind or using a text-only
browser such as Lynx. Perhaps, after a certain number of failed attempts, the
software could refuse to let the user try again until a given amount of time
expires OR the user enters a captcha?

phi1ipp wrote:

Anyway, please keep discussion on specific features to specific bugs
(e.g., bug 9836), not the tracker bug.

Replies where questions. Regards.

I also don't like captchas. But please don't force users to use at least one digit or something like that, because instead of increasing the security it will actually reduce the search space.

Instead I propose this:

(1) convert the password at least to Unicode NFC (and apply any other suitable normalization like compression of whitespaces). Possibly even to NFKC (to avoid compatibility characters). If that password, after normalization, is different from what the user typed, make sure to inform the user to confirm that this is what is happening.

(2) compute the basic size S of the alphabet :

  • if a lowercase ASCII letter is used anywhere in the password, add 26 to the the alphabet size
  • if an uppercase ASCII letter is used anywhere in the password, add 26 to the the alphabet size
  • if a decimal ASCII digit is used anywhere in the password, add 10 to the the alphabet size
  • if a ASCII punctuation is used anywhere in the password, add the size of this ASCII punctuation subset to the alphabet size.
  • on localized wikis, consider other subsets consisting in non-ASCII letters used in their alphabet (take CLDR data appropriate for that language, remove the characters already part of the previous subsets, and then add the remaining characters to the basic size S).
  • if other Unicode characters are included, accept them individually by adding 1 to S for each distinct character (but inform users that they may have difficulties to connect from some environments with such password).

(3) take the base-2 logarithm of the alphabet size, and multiply by the password length (N). This gives the raw "bit-length" strength of a password. In other words : raw bit-length strength = log2(S)*N

(4) if a space is accepted in the password, it should just occur in the middle and not at the begining or end and not in sequences of more than one space. Because of that, a password of length N cannot contain more than (N - 1) DIV 2 spaces, which adds ((N-1)DIV 2)*log(S+1)/log(S) to the row bit-length strength.

Of course you can check that basic default passwords is not used (like "0000" or "1234" or "password" or "admin" or the username itself, or any word contained in the user's own public identity like hist public first name or last name, or any word contained in his user page, or in the first 1KB of his talk page).

But using any large dicionary to forbid passwords may actually reduce the bit-length strength rather than increasing it, for brute-force attacks (even if it protects from dictionnary-based attacks), by allowing them to skip too words contained in that known dictionary. And it may also forget many wellknown common words (including first names) from other foreign languages (my opinion is that the dictionary used should just be built from the terminology used in the MediaWiki messages stored in the "MediaWiki:" space, in all its supported languages, and for each extension that is activated in the wiki where the account is created).

However, even if a password is not strong enough, users should still not be forbidden access completely: he should be denied from using the secure server, but will be informed that his password is not strong enough to be used there, but he will have the option to go to the non-secure servers.

I also suggest then that the user's Preferences panel include such password bit-length strength (computed like above) and a visual color bar indicating him the basic security of his account, and if the bitlength strength is suitable for identification on the secure server.

(In reply to comment #25)

I also don't like captchas. But please don't force users to use at least one
digit or something like that, because instead of increasing the security it
will actually reduce the search space.

Instead I propose this:

[snip huge proposal]

Per comment #18, please open new bugs for new proposals. I'll add as a quick side note that I do not believe in restricting user's passwords to force them to be stronger: users choose their own passwords, as wisely or as unwisely as they want. This is their own responsibility. We can and should help keep their passwords from being compromised by using SSL on every login; I believe there's already an open bug for that.

Note that for dictionnary lookups (when evaluating the bit-length strength), there's already good dictionnaries available: you may just check the existence of the word in an article in the main space of the list of existing Wiktionnaries that have more than about 10000 entries. This just requires a single http request per tested wiki.

Then the actual computed bit-length strength can be reduced to the base-2 logarithm of the tested Wiktionnary sizes (measured as the number of articles in the main space of the tested wikis, summed together, or to their logarithmic average).

The computed numeric value should also be made visible (and should be recomputed each time the user visits its Preferences page, if the algorithm is later updated), in addition to the color-coded visual evaluation of that value (such as, black: insufficient and not acceptable, red: strong warning, yellow: acceptable, green: good, blue: strong).

"I do not believe in restricting user's passwords to force them
to be stronger: users choose their own passwords, as wisely or as unwisely as
they want".

Did I suggest that? No. I exactly propose to help user to choose their password wisely, and in fact more freeely that what the other suggestions below are doing, because I don't want to force users to use a mix of capital/lowercase letters, or digits.

The algorithm will be relaxed enough to allow users to choose whatever characters they want, or the password length they want, or even pass phrases (when accepting spaces), WITHOUT reducing the search space (in fact it does NOT restrict the search space, but increases it by allowing MORE freedom for users, and MORE difficulties for password crackers). And I also give caution that dictionnary lookups are bad if the dictionnary is wellknown and is enforced in a restrictive way (because it will actually help the password crackers if it is enforced).

A lot has changed for Wikimedia wikis since this task was filed in 2007:

I found Brion's comment at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_AF#Status_of_secure.wikimedia.org.3F from December 2006 to be an interesting window into how things were about a decade ago. Things have changed. :-)

Whether it's a good idea to use CAPTCHAs, implement stricter password requirements, or even mandate the use of HTTPS for most users remains highly debatable and complex, in my opinion.

I realize that this is a tracking task, but I'm not sure there's still value in keeping this task open, given developments since it was filed. What work is remaining here? Can this task be marked resolved?

Adding a measurement of password strength is becoming standard in many sites that require security. Such measurement is a good indicator for users that otherwise think their password is "good enough" but use false assumptions or are not aware of the ongoing attacks (notably dictionary-based attacks, using either static dictionnaries or words collected from public user's profiles or comments collected on their wiki user page, or other solcial networks that they have linked publicly to their wiki account).

It's not complex at all, basically we are not required to display the actual number, when a simple colored jauge bar provides the necessary information: we don't need to explain them how many letters or digits they need to use or how we measure strength: a low-level evaluation immediately informs them if their current password is strong enough, users can type more characters, or use a larger repertoire of characters as they want. If the strength is very low, we can give hints about what they can do.

Also my comments was not to require specific subsets of characters, but allow much preedom (even if users use only lowercase letters, and no capitals, digits or punctuation, or only digits a password/passphase can still be very strong while permitting users to use the characters they can remember or type easily on any device with limited keyboard support). We keep the freedom of choice with almost all Unicode-encoded characters permitted, and this is good to protect us from brute-force attacks as they'll have to use a larger repertoire (I don't like at all these bad sites that want ONLY characters in a very small subset of ASCII, such as only letters or digits, sometimes only lowercase or using case folding, or sites that only want a fixed number of digits, and are incorrectly saying they are "secure" when they are extremely easy to crack)

Basically, such measurement jsut requires a small javascript to evaluate the input and refresh the evaluation jauge (adjusting the width and or color or label). The strength could also be evaluated on the server, given that we are using HTTPS (so it can be sent using clear text over encrypted HTTPS, provided the HTTPS channel is effectively encrypted using at least a strong and unique session key). The server will then just store the strongly-hashed password (not decryptable easily) and the bitstrength as a single number (that will allow us to enforce some security level for enabling some admin privileges) and alert users when the minimum strength is about to be reached in the next year (i.e. when its current strength is one bit below the minimum). This only requires two small fields on the user account stored on the server, the strength measurement number, and a date of last alert sent to the user (to avoid spamming that user constantly when their privileges are about to be suspended).

Note: any strength evaluator must still keep the input form usable with external password managers (that will compute a strong password but type it directly in the input form). And users should still be able to paste the password in the input form: do not block CTRL+V/Paste, even if you block CTRL+C/Copy. And while typing it, there should be a way to make that password visible locally (in the input form itself, locally on the browser, that password remains in clear text even if it is finally submitted to the server in encrypted or hashed form (if its is hashed, the hashing function used by the password input form must remain cryptographically strong: don't use MD5 or SHA1, even if it is "salted", prefer SHA2).

I realize that this is a tracking task, but I'm not sure there's still value in keeping this task open, given developments since it was filed.

I agree.