Page MenuHomePhabricator

Consider changing recovery codes to use six digits
Open, Needs TriagePublic

Description

TOTP codes consist of six digits. Recovery codes are long alphanumeric strings. This complicates interface design (either one has two create two interfaces or a single interface which is generic - ie. no numeric keypad on mobile, no size limit) and thus degrades user experience, while the security benefits are minimal.

Should we just change the recovery tokens to use the same six-digit format? Any attack that can be pulled of against the scratch tokens can probably be pulled of against the TOTP codes as well (the expected number of guesses against a six-digit scratch token is 500,000; the expected number of guesses against a TOTP code is at most 1,000,000 but might be less if it's sufficiently non-periodic that not repeating guesses improves the chance of success). Migrating existing tokens would be a pain but so far the extension is not widely used and we could just use a similar approach to password policy changes (check on login, or even just site visit, and ask the user to disable and reenable if they use old-style scratch tokens).

Event Timeline

Tgr renamed this task from Change OATH scratch tokens to use six digits to Consider changing OATH scratch tokens to use six digits.Feb 15 2017, 4:05 AM

FWIW, Google/Gmail use 8 numbers

Github use 8 alpha numeric. Steam use 7

Note that the scratch tokens operate under a different attack scenario than TOTP codes, and thus they cannot be the same format.

The only reason TOTP codes can be six-digit numeric is because they are only valid for around 90 seconds, and are validated using a rate-limited online challenge. By the time an attacker might be able to guess a code on the login form, they will either be locked out or the code will expire.

Scratch codes are valid until they are reset, and thus can be infinitely retried (pending online ratelimits, of course). Thus they must be proper cryptographically random tokens that are resistant to online brute force attacks, unlike the TOTP codes.

The scratch codes can be made solely numeric, but note that they would need to be at least 10 digits long in order to meet RFC 4086's password entropy recommendations for online threat scenarios.

Note that the scratch tokens operate under a different attack scenario than TOTP codes, and thus they cannot be the same format.

The task description already explains why that difference is negligible (a factor or of two at most).

Note that the scratch tokens operate under a different attack scenario than TOTP codes, and thus they cannot be the same format.

The task description already explains why that difference is negligible (a factor or of two at most).

No, it actually doesn't. Please read the rest of my comment.

Whether the number to hit changes every once in a while or not makes no difference whatsoever when you are guessing randomly. So the only difference is that with the TOTP code you need to be guessing randomly (ie. each guess has an 1E-6 chance of success, and with the scratch code you can "tick off" guesses you have already made (ie. your first guess has a chance of 1/1,000,000, the second 1/999,999 etc). For a small number of guesses that's a negligible difference. Since you quoted RFC 4086 (which posits 0.1% as an acceptable chance for the attacker to succeed): when trying to guess the scratch code (assuming it is six digits), you need to make 1000 guesses (1/1000 part of the search space) to have a 0.1% chance of success. When trying to guess the TOTP number, each guess has an 1E-6 chance so N guesses have an 1 - 1/(1 - 1E-6)^N chance so for that to be larger than 1E-3 you need log ( 1- 1E-3) / log (1 - 1E-6) ~= 1000.5 guesses - no difference at all.

So TOTP fails RFC 4086 whatever you do. Which doesn't really matter since that RFC is not an actual recommendation (more like an explanation of tradeoffs) and is about an online password brute-forcing scenario, not a second token. The assumptions it makes (e.g. the ability that an attacker can fail a hundred thousand times undetected) might make sense for password but is easy to defend aginst for a second factor, where failures are rare and even a small number of them is very suspicious and should prompt a warning for the user to change their password. (Not that we do that, but that's another matter.) Sixteen-digit scratch tokens are just voodoo security.

Whether the number to hit changes every once in a while or not makes no difference whatsoever when you are guessing randomly. For a small number of guesses that's a negligible difference.

Why are you assuming that an attacker would be guessing randomly? Or that they'd be performing only a small number of guesses? Scratch codes are indefinitely valid. An attacker could guess for an entire year if they so desired, and it would be possible to guess a six-digit code over a long period time, whereas it would be impossible to do so with a sufficiently high-entropy random token.

Since you quoted RFC 4086 (which posits 0.1% as an acceptable chance for the attacker to succeed): when trying to guess the scratch code (assuming it is six digits), you need to make 1000 guesses (1/1000 part of the search space) to have a 0.1% chance of success. When trying to guess the TOTP number, each guess has an 1E-6 chance so N guesses have an 1 - 1/(1 - 1E-6)^N chance so for that to be larger than 1E-3 you need log ( 1- 1E-3) / log (1 - 1E-6) ~= 1000.5 guesses - no difference at all.

As I said, you're entirely ignoring the time period of the attack. The fact that the TOTP token changes every ninety seconds is significant. To search 0.1% of the space, you'd have to perform those 1000 guesses before the token changes. Otherwise your equation for the probability resets, and you're back where you started. It is because of this time limitation that TOTP is even used.

The assumptions it makes (e.g. the ability that an attacker can fail a hundred thousand times undetected) might make sense for password but is easy to defend aginst for a second factor, where failures are rare and even a small number of them is very suspicious and should prompt a warning for the user to change their password. (Not that we do that, but that's another matter.)

It's not "another matter". We can't just roll out six-digit scratch tokens under the assumption that, if we someday implemented failure detection for 2FA, it would become more secure. The fact is that those safeguards are currently not in place, and because of that a long-term online attack is surmountable.

An attacker can already launch a year-long attack on the normal (non-scratch) tokens. That they change periodically does not protect against that at all. Changing tokens is a way to defend against eavesdropping / replay attacks, not to make guessing more difficult. I'm not sure how to get that point across after I even provided the exact probability calculations and you ignored them :(

All that changing the scratch tokens to six digits would do is to lower their security to the same level TOTP tokens are at. There is no point in keeping the windows boarded if you leave the door open.

An attacker can already launch a year-long attack on the normal (non-scratch) tokens. That they change periodically does not protect against that at all.

That is absolutely not true at all! Let me try and explain it again.

To be clear, there are two different attack scenarios to consider here, and I'll explain them successively:

  1. An online random guessing attack, where an attacker tries completely random keys, e.g., chosen by a PRNG, and may or may not repeat guesses.
  2. An online brute-force attack, i.e., where an attacker contacts the server and makes successive attempts at guessing the secret, marking down what guesses he makes as he goes on, and not repeating guesses.

In addition, the parameters of both of these scenarios are:

  • The attacker has unlimited guesses (since we have no lockout mechanism).
  • The attacker is rate-limited as to how many guesses he can perform in a given period of time.

You seem to be focusing on scenario (1). You are right that in scenario (1), the security between a six-digit TOTP token and a backup code is identical. But it does not make either insecure as you claimed. The probability of individual random guesses without replacement are independent. Given a population space of 10^N, where in this case N = 6, an attacker will always have a probability of 10^-N of choosing the correct key. For our N, this gives an attacker a 1E-6 = 0.0001% probability, which is well within acceptable ranges. The critical point here is that the attackers' probability does not increase over time, in contrast to what your equation says. The attacker will always have a slim chance, even if the key remains valid indefinitely.

I am most concerned with scenario (2), which is a much more likely scenario. Here, the attacker is making guesses without replacement. Thus, as you have described, each successive guess results in a higher probability of success. To copy your equation: 1 - 1/(1 - 1E-N)^G is the probability in N space after G guesses, assuming the key is generated by some pseudo-random function like it is in TOTP. And, as you have noted, after around 1,000 guesses in 6-digit space, the probability of a correct guess exceeds reasonable limits. The critical difference, though, between TOTP and scratch tokens, is that for TOTP, the probability resets every 90 seconds, i.e., the old key becomes invalid, a new key is generated, and it is now possible that the new key could match a guess the attacker has already made. In order for the attacker to reach 0.1% probability, they would need to make the requisite 1,000 guesses before the time window expires. For scratch tokens, there is no such requirement, and an attacker could easily reach 1,000 guesses without spending too much time.

The crux here, as I've stated, is the difference between probability of random choice with replacement vs. without replacement.


Changing tokens is a way to defend against eavesdropping / replay attacks, not to make guessing more difficult.

I'd also like to point out that it is also not true. If interacting with a system that complies with RFC 6238's recommendations, which says:

The verifier MUST NOT accept the second attempt of the OTP after the successful validation has been issued for the first OTP, which ensures one-time only use of an OTP.

Then the system is not vulnerable to replay attacks. Once a token is played, it cannot be replayed. And E:OATHAuth actually does this by storing used tokens in cache for comparison. You could maybe argue that the time validity of tokens is made to combat MITM attacks, but I don't think that a successful MITM attacker would bother waiting 90 seconds after sniffing credentials before attempting to log into the server.

Of course, TOTP is vulnerable to any number of other attacks, including (as mentioned) MITM if the transport stream is compromised, and stealing of the original shared secret (if it's not stored on a tamper-evident device). But none of those are really affected by the time window of each token's validity. In fact, I'd go so far as to argue that the only purpose of having a time-rotated token is to protect against the brute force attack described above.

To be clear, there are two different attack scenarios to consider here, and I'll explain them successively:

  1. An online random guessing attack, where an attacker tries completely random keys, e.g., chosen by a PRNG, and may or may not repeat guesses.
  2. An online brute-force attack, i.e., where an attacker contacts the server and makes successive attempts at guessing the secret, marking down what guesses he makes as he goes on, and not repeating guesses.

That is correct. Where you go wrong is that you assume that they differ significantly in effectiveness.

In general, if two events (e.g. that typing 000000 gets you through the login screen and that typing 000001 does that) have p probability, then the probability of either one of them happening is 2p if they are exclusive (you are attacking a scratch token; it cannot be both 000000 and 000001) and it's 2p - p^2 if they are independent (you are attacking TOTP and the target has been changed between the two attempts). When p is small, p^2 is negligible.

To give a concrete example, let's say the attacker can get 10,000 tries before being noticed. In scenario 1, each try has 1E-6 chance and they are independent of each other (assuming the attacker can only get one try per time slot). The attacker fails if every single one of the 10,000 tries fail. The chance of a single failure is 1 - 1E6; since these are independent events, we can just multiply the probabilites, so that chance of all attacks failing is (1 - 1E-6) ^ 1E4 ~= 0.99. So the chance of succeeding is 1 - 0.99, ie. 1%.

In scenario 2, the probabilities are exclusive and you can just add them together so the attacker's chance of success is 1E-6 * 1E4 = 0.01, exactly the same. (Actually very slightly larger but the difference starts at the fifth digit after the decimal point.)

The difference between the two scenarios will become noticable when the number of attempts becomes the same magnitude as the number of possible codes, but even then it won't be huge. E.g. if you are allowed to make 1,000,000 attacks then the chance of success with random guessing is 1 - (1 - 1E-6) ^ 1E6 which is something like 60%, and the chance of success with brute force will of course be 100%. Still within a factor of two.

Once a token is played, it cannot be replayed. And E:OATHAuth actually does this by storing used tokens in cache for comparison.

Well, yes, sure, but you can only do that because the password is different at different times. If someone implemented a "second factor" that's a simple six digit PIN code that's always the same, blocked passcodes used in the past would not make sense. Still, the system would be exactly as strong against guessing / brute force attacks as TOTP. That's not the reason why TOTP is useful (instead of just forcing a stronger password in the first place).

In scenario 2, the probabilities are exclusive and you can just add them together so the attacker's chance of success is 1E-6 * 1E4 = 0.01, exactly the same. (Actually very slightly larger but the difference starts at the fifth digit after the decimal point.)

As I've been repeatedly saying, the probability of failure decreases with each guess you make. The actual probability for scenario 2 is sum_{1}^{G}(1/(1e6-n)), not just 1e-6*G.

The difference between the two scenarios will become noticable when the number of attempts becomes the same magnitude as the number of possible codes, but even then it won't be huge. E.g. if you are allowed to make 1,000,000 attacks then the chance of success with random guessing is 1 - (1 - 1E-6) ^ 1E6 which is something like 60%, and the chance of success with brute force will of course be 100%. Still within a factor of two.

With the corrected equation above, here is what the probabilities for example scenarios are:

GuessesScenario 1Scenario 2
100,0009.5%10.5%
200,00018.1%22.3%
300,00025.9%35.6%
400,00032.9%51.1%
500,00039.3%69.3%

As you can see, the probability for scenario 2 is exponential (well not really, but effectively) whereas scenario 1 is only linear. Once you've guessed a significant portion of the available key space, chances of success are much greater.

Now it effectively becomes a trade-off: what are the chances of an attacker being dedicated enough to perform a brute-force attack? And is it worth allowing that attack, just so these backup codes that are used by users in exceptional scenarios can be slightly shorter?

Well, yes, sure, but you can only do that because the password is different at different times. If someone implemented a "second factor" that's a simple six digit PIN code that's always the same, blocked passcodes used in the past would not make sense. Still, the system would be exactly as strong against guessing / brute force attacks as TOTP. That's not the reason why TOTP is useful (instead of just forcing a stronger password in the first place).

Huh? You were the one who said TOTP is vulnerable to replay attacks. Are you now agreeing with me? You do realize that a unchanging six-digit PIN is vulnerable to replay attacks, but TOTP is not.

As I've been repeatedly saying, the probability of failure decreases with each guess you make. The actual probability for scenario 2 is sum_{1}^{G}(1/(1e6-n)), not just 1e-6*G.

The probability of failure does decrease, but you can't just sum non-independent probabilities. It's easy to see that the 1e-6*G formula is correct (although deriving it the way you are trying is probably hard): assume you pick G numbers (out of the N possible ones), then I choose the real value of the token at random. Since I have N values to choose from, I'll choose each one with probability 1/N, and you have picked G of the so the chance of me choosing one that you picked is G/N. Obviously that number won't change even if I pick the
If you want to be scientific about it, this is a negative hypergeometric distribution with N=1,000,000, M=1, m=1.

With the corrected equation above, here is what the probabilities for example scenarios are:

GuessesScenario 1Scenario 2
100,0009.5%10.5%
200,00018.1%22.3%
300,00025.9%35.6%
400,00032.9%51.1%
500,00039.3%69.3%

As you can see, the probability for scenario 2 is exponential (well not really, but effectively) whereas scenario 1 is only linear. Once you've guessed a significant portion of the available key space, chances of success are much greater.

The correct values for Scenario 2 would be 10%, 20% etc but it does no make much difference either way - even with these numbers the difference between the two scenarios is negligible. It does not make any sense to consider a system where a certain attack has 70% chance of succeeding unacceptable but consider another system where the same attack has a 40% chance safe.

Huh? You were the one who said TOTP is vulnerable to replay attacks. Are you now agreeing with me? You do realize that a unchanging six-digit PIN is vulnerable to replay attacks, but TOTP is not.

I'll drop this thread as I feel we are talking next to each other and it's unimportant anyway. I think my original point stands: even according to the slightly incorrect numbers you came up with, the difference within the strength of the TOTP token and the strength of the scratch token is within a factor of two (e.g. you need ~700K guesses for an 50% chance in Scenario 1 and ~400K guesses in Scenario 2) so if we accept one of them safe there is no reason not to accept the another.

so if we accept one of them safe there is no reason not to accept the another.

I'll walk back from that a bit. Static vs. changing is a negligable difference, but guessing the TOTP token is a three-in-a-million chance (since we accept the next/prev time slot as well) and hitting the scratch tokens is a ten-in-whatever chance (since we assign ten of them). So if we switch to six-digit scratch tokens an attackers job would become about four times easier. Still not a huge difference IMO.

Reedy renamed this task from Consider changing OATH scratch tokens to use six digits to Consider changing recovery codes to use six digits.Jan 1 2024, 8:50 PM
Reedy updated the task description. (Show Details)