Password length check should count unicode characters
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Tgr
	Dec 10 2018, 8:10 AM

Description

...not bytes.

Raised at mw:Topic:Upvztrigdixy2mga. One good point made there is that counting Unicode code points is unfair towards ideogram/ideograph based languages where a character has much more entropy (and is presumably harder to remember a 8-character sentence).

Related Objects

Mentioned In: T151425: Enlarge Popular Password File to 100,000 entries and enforce the new minimum in the config
T218449: Determine new password requirements for MediaWiki core

Event Timeline

Tgr created this task.Dec 10 2018, 8:10 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 10 2018, 8:10 AM

Tgr updated the task description. (Show Details)Dec 10 2018, 8:34 AM

using mb_strlen instead of strlen would probably get us 60% of the way to something reasonable.

An alternative view would be, maybe using non-ascii characters that are less likely to appear in a cracking dictionary increases your entropy enough to counteract the bad counting of characters (No idea if that's true or not).

The internet also claims grapheme_strlen is a thing - https://secure.php.net/manual/en/function.grapheme-strlen.php but it didn't seem to work when i tested locally...

The SecLists top million file only contains two passwords with non-ASCII characters so for dictionary attacks they seem pretty strong.

The other issue is that users understand characters better than bytes. But I guess as long as any discrepancies with the stated rule are on the permissive side, that's not really a problem.

Jdforrester-WMF subscribed.Dec 10 2018, 6:45 PM

In the context of Unicode discussions “characters” is an ill-advised term because some code points—such as U+FFFE—are explicitly defined as “non-characters” .

Also, what about passwords which are not valid UTF-8 strings? Will something containing \300 or \301 be rejected on this ground?

In T211550#4825302, @Incnis_Mrsi wrote:

Also, what about passwords which are not valid UTF-8 strings?

They are great ways of getting locked out of the site as soon as some low-level detail of input parsing changes.

In T211550#4825302, @Incnis_Mrsi wrote:

In the context of Unicode discussions “characters” is an ill-advised term because some code points—such as U+FFFE—are explicitly defined as “non-characters” .

Also, what about passwords which are not valid UTF-8 strings? Will something containing \300 or \301 be rejected on this ground?

I havent tested but i imagine normal input normalization would apply (convert to NFC. Anything not valid utf-8 gets changed to replacement character)

The NIST guidelines say "For purposes of the above length requirements, each Unicode code point SHALL be counted as a single character."
Absent strong reasons to the contrary, we should probably go with that.

Tgr mentioned this in T218449: Determine new password requirements for MediaWiki core.Mar 15 2019, 10:09 PM

Tgr mentioned this in T151425: Enlarge Popular Password File to 100,000 entries and enforce the new minimum in the config.

Password length check should count unicode charactersOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Password length check should count unicode characters
Open, Needs TriagePublic
Actions