Product Safety and Integrity would like to propose adding a new column to the globaluser table – gu_email_normalized, which would contain the user e-mail, but in a normalized form. This field wouldn't be used for sending e-mails to the target user, but instead would serve as basis for future anti-abuse work.
Context
There are popular e-mail providers who are known to apply certain clean-up steps when resolving the recipient's e-mail. An example would be Gmail, where john.doe@gmail.com and johndoe+spam@gmail.com are resolved to the same address.
As of now, using the account's e-mail address as part of the anti-abuse signals is error-prone, as we cannot reliably match "any known form" of the user's e-mail (nor query for all users with the same "canonical" e-mail).
Proposed change
Add gu_email_normalized to globaluser. It's going to store the normalized value of the gu_email field.
We don't have defined yet how the "normalized" for of e-mail is going to look like – the two options that I can think of is: either human-readable text, stripped out of the unneeded characters or the same but hashed (so that it can have a constant length, smaller than the limit for gu_email). We're open for suggestions from DBA on how this can be done in a sustainable way.
The exact size of the field should make collisions unlikely, but at the same time it doesn't have to be human-readable.
Exact implementation of the normalization algorithm is not part of this request (PSI can figure it out, once the DB details are agreed on).