Page MenuHomePhabricator

Reveal email recipient's username in checkuser query results
Open, MediumPublic

Description

Hello,

When we check any IP for any user, the data that appear include information about email messages sent by the user to other users, but the name of this users not appear (It's appear as encrypted hash).

So can we make the user account name appear instead of the encrypted hash?

It will have a benefit.

Event Timeline

Reedy renamed this task from When a user sent an email message to a user (Checkuser) to Include username of user that was sent an email in checkuser query results.Mar 17 2018, 6:44 PM
Reedy renamed this task from Include username of user that was sent an email in checkuser query results to Include target of emails in checkuser query results.Mar 17 2018, 6:57 PM

I think this is intentional for privacy reasons.

@MarcoAurelio I think so also.

But in several situations (Like Account compromised) that I faced, I noticed that knowledge to whom the email sent, will help a lot.

I think we need, at least input from the Security-Team hence adding them here.

The hash is the mail address + user id + secret, not the user name, therefore it has to be hashed to still allow to see if it was send to the same address or a different one.

CheckUser has to store the username instead, when it is okay to see the target.

@Umherirrender is correct; to clarify, the has is created by concatenating the email address of the receiving user, the username of the receiving user, and the global secret string of the wiki, and then passing this long string through the MD5 algorithm (here is the code). As a result, if User A and User B both emailed User X, and the email address that User X had registered with the wiki was the same at both occasions, the hash generate would also be the same in both occasions.

Now, since this data is recorded in the CU database when the EmailUser hook is called, I am not sure why it would be beneficial for the recipient's email address to be part of the hash. In fact, it would have been better, IMHO, if the hash was just based on the recipient's user name + $wgSecretKey because that way, all emails going to the same user would show the same hash even if that recipient changed their registered email address.

Also, I cannot think of how it would be a violation of privacy of the sender or the recipient if we showed a log like ... sent an email to user User:X instead of what we have now which is ... sent an email to user 52ff4d1b84979e5b8ac0c8350bba7738. After all, on-wiki email exchanges are logged actions, and the Privacy Policy allows for logged actions to be investigated by CheckUsers.

Huji renamed this task from Include target of emails in checkuser query results to Reveal email recipient's username in checkuser query results.Mar 17 2018, 9:20 PM
Bawolff added subscribers: APalmer_WMF, Bawolff.

I think we need, at least input from the Security-Team hence adding them here.

This is more a political/privacy question than a security one.

I would say it'd be ok, if there's some discussion on say meta where the community is ok with giving this info to check users, and if legal is ok with the proposed changes.

As an aside, if we did want to keep the hash thing, the current hashing scheme is kind of poor for its purposes, if we're having a hash, we should add the month to the hash (or the "quarter") so that hashes become useless after 90 days.

FTR (said in a call with the Stewards Tuesday but for the record) I'm going to be talking with Legal about this and will loop back once we're set there and/or have other questions.

Assuming this passes all security and privacy reviews, the Anti-Harassment tools team would be happy to look into this (at a minimum investigate and estimate the work required, but also build and release if not too much of a time investment.)

@TBolliger the work required is very little; chances are once approved, I would submit a patch the next day :)

Hey hey even better! We will help with Code Review, in that case.

The next step would be to attain approval from the Security Team. I am not part of that team.

Not sure why T117801 was merged. It was about exposing the registered email address for an account, not user to user emails. (useful for cases where the same email address is used to register multiple accounts)

The next step would be to attain approval from the Security Team. I am not part of that team.

@Bawolff ?

Legal needs to approve as a privacy thing.

Yes, this will need a check from WMF Legal. I am happy to be part of this process, as I've facilitated it before for a few other projects.

FTR (said in a call with the Stewards Tuesday but for the record) I'm going to be talking with Legal about this and will loop back once we're set there and/or have other questions.

@Jalexander Any status update to share on this?

chasemp triaged this task as Medium priority.Sep 4 2018, 3:34 PM

Since displaying the username all the time may be bad for the privacy, maybe is it possible to keep the hash but give a tool which compare it with the username given by the CU.

Indeed, CU really need the recipient's username only when the recipient ask for confirmation, to confirm an abuse.

Technically, it will be a special page (or CU page) with two textarea : one for the sender and one for the recipient. When submitted, it will give the CU results which match, and log it in the private CU log.

I'm having a hard time conceiving of a situation where a checkuser would need to know the name of the *recipient* of an email; the investigation focuses on the sender, not the recipient. The only way I can see this helping an investigation would be if we asked the recipient about the content of the email, which I believe would freak out the recipient. We would not have any way of knowing whether or not the recipient had even read the email.

There may be some sort of benefit for T&S to have access to some kind of tool when investigating a harassment complaint that involves an email sent via the interface, but they would probably already be in a position to request the info through other channels.

I would suggest, however, that it would be useful for checkusers to "sort" results from a check by type of activity (e.g., edit, trigger edit filter, log in, send email, etc), which would help in identifying patterns within the results.

I would suggest, however, that it would be useful for checkusers to "sort" results from a check by type of activity (e.g., edit, trigger edit filter, log in, send email, etc), which would help in identifying patterns within the results.

You are essentially talking about T145265: Store check user data action text in structured format

T&S can already extract the user ID and email address of the recipient as this is encrypted and then stored in the DB. I can see some use cases of a CU wanting to see the recipient of an email, especially if an LTA is constantly emailing someone. However, there seems to be disagreement as to whether this is needed so moving to discussion needed section.

It's worth noting that since the hash is constant for a given recipient, if you suspect you know who the recipient is, you can verify that by sending that recipient an email and then running CU on yourself to see what the hash is. I don't know if that's a good thing or a bad thing :-)