Page MenuHomePhabricator

Improve hashing strategy for EditorJourney
Closed, ResolvedPublic

Description

@dr0ptp4kt provided some helpful feedback on how to strengthen the security of hashed URLs that are included in EventLogging data for EditorJourney.

Currently we use the user's token as the salt for hashing URLs. Instead of this, we can:

  1. Generate a hash of the user's token as a lookup key for getting/setting the HMAC salt in Redis
  2. Attempt to get the salt from redis with the key, if not found, generate random value and store in Redis with a TTL of 24 hours
  3. Use the salt stored in Redis to hash the URLs

The end result is that the salt for hashing is only around for 24 hours, as opposed to potentially months/years with the session token in the MW database.

Details

Related Gerrit Patches:
operations/mediawiki-config : masterEnable error logging for WikimediaEvents
mediawiki/extensions/WikimediaEvents : masterReplace token parameter value with redacted string
mediawiki/extensions/WikimediaEvents : masterEditorJourney: Store hash salt in Redis with 24 hour TTL

Event Timeline

kostajh created this task.Oct 29 2018, 3:23 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 29 2018, 3:23 PM

Change 470451 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] EditorJourney: Store hash salt in Redis with 24 hour TTL

https://gerrit.wikimedia.org/r/470451

kostajh updated the task description. (Show Details)Oct 29 2018, 7:30 PM

@Bawolff I added a question in the patchset about getToken(). Basically, although the cost of computing a rainbow table to reverse engineer the hashed values of getToken() in case of someone spilling Redis keynames is moderately high, I wanted to check whether there's even a risk if an attacker does so. If there's a risk if an attacker does so, I'm thinking we should instead take just a portion of the token (I'm working from the assumption this is tied to something fixed between the client and the server - a cookie issued post login) and the user's numerical ID and concatenate those and then hash that concatenated value for setting the keyname - that would still be basically collision free for keynaming purposes.

In other words:

  • Just hash the token to define the keyname: the keyname is random looking so it's hard to figure out with any certainty who the user is (and this only matters if the user spills both Redis and gets at the EL data). But there's the open question of whether in constructing the rainbow table an attacker could login.
  • Hash the concatenation of part of the token and something about the user: here too the keyname is random looking, although reconstructing the thing about the user is a bit easier as part of the pre-hashed value would presumably be in a fixed location within the string. But it's important to remember here that the attacker would still need to spill both Redis and get at the EL data for this to matter, too. This solution would necessarily not contain enough information in the keyname for an attacker to spoof an identity for authentication.

Something to keep in mind:

  • This is isolated to a couple wikis, and applies only to newly registered users.
  • The salt for hashing URLs only survives 24 hours (from what I understand logging only has to happen for 24 hours, too, although need @MMiller_WMF to confirm my understanding here).

The salt for hashing URLs only survives 24 hours (from what I understand logging only has to happen for 24 hours, too, although need @MMiller_WMF to confirm my understanding here).

Correct, we are only logging the first 24 hours of a new user’s activities to the EditorJourney schema.

Follow up here: Kosta and I spoke, and we don't need the token, as logging should take place on a per-user basis, not just on a per-session basis. So the key will be constructed by hashing two non-sensitive items. This is an okay approach in my view given the requirements.

See https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/470451/2/includes/PageViews.php for the latest treatment as of this comment.

Change 470451 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] EditorJourney: Store hash salt in Redis with 24 hour TTL

https://gerrit.wikimedia.org/r/470451

Change 471116 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[operations/mediawiki-config@master] Enable error logging for WikimediaEvents

https://gerrit.wikimedia.org/r/471116

Change 471316 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Replace token parameter value with redacted string

https://gerrit.wikimedia.org/r/471316

Change 471316 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Replace token parameter value with redacted string

https://gerrit.wikimedia.org/r/471316

Etonkovidova closed this task as Resolved.Nov 2 2018, 11:18 PM
Etonkovidova added a subscriber: Etonkovidova.

Checked on betalabs deployment-eventlog5 - table EditorJourney_ has hushed URL (and other table entries that need it).

Change 471116 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable error logging for WikimediaEvents

https://gerrit.wikimedia.org/r/471116