Some background for this task:
Historically we agreed to hash all permanent identifiers of EventLogging data in the EL sanitization white-list.
When hashing, we add a salt to the token, and we rotate the salt every 3 months.
This way, hashed tokens keep the ability to link events generated by the same token-holder,
but only for events that belong to the same 3-month period (salt rotation breaks link).
There is, though, one potential weakness to this approach: Around the salt rotation time (end/beginning of quarter)
if another non-hashed identifier, like a short lived token that we decided not to hash, spans a little bit before
and after the salt rotation time, then it can be used to link events before and after the salt rotation,
and associate hashed tokens before and after, thus creating a chain that defeats the effect of hashing+salting.
I believe that the risk of this weakness is not high, because the proportion of users that will generate events
close enough to the salt rotation time so that the non-hashed temporary tokens can generate the chain, is really small.
However, the longer-lived the temporary token the higher risk, i.e. non-hashed tokens that span half an hour or less are OK,
but non-hashed tokens that span 1 week or more are not OK.
The actual task:
This task is about finding such tokens that are not hashed and mark them to be hashed in the EL white-list.
Initially the only disadvantage of doing so is they will be "cut" at end/start of quarter. For example: if a non-hashed
token "session" spans about half an hour, and is placed around salt rotation time, then hashing will split it into two sessions,
one before the salt rotation, and another after. All other sessions not around end/start of quarter, will be intact.
The shorter-lived a token, the less negative effect the hashing will have on it.
Here app_install_id is hashed, but session_token is not, thus potentially invalidating the effect of hashing+salting app_install_id.
MobileWikiAppLangSelect: event: action: keep appInstallID: hash app_install_id: hash newLang: keep oldLang: keep sessionToken: keep source: keep timeSpent: keep client_dt: keep useragent: os_family: keep wmf_app_version: keep webhost: keep wiki: keep