Page MenuHomePhabricator

RfC: Retained account data self-discovery
Closed, DeclinedPublic

Description

The CheckUser extension stores information about each change to the wiki for a fixed period of time (by default, three months). This information includes the following database fields for each action to the wiki:

cu_changes.cuc_user – account ID of the user performing an action; this would be used for self-lookups; it's indexed [(cuc_user,cuc_ip,cuc_timestamp)]
cu_changes.cuc_ip – IP address [IPv4 and IPv6]
cu_changes.cuc_xff – XFF data
cu_changes.cuc_agent – User-Agent data

By default, MediaWiki core also stores private information in the recentchanges table:

recentchanges.rc_ip – IP address

In the interest of freedom of information and enhancing account security, it should be possible for users to see the private data stored about themselves at any time.

The implementation of this idea will be done though an extension (Extension:AccountInfo).

For more details see Retained account data self-discovery RfC.

Related: T29242: Allow users to see their own stored private information ("self-CheckUser")

Details

Reference
fl580

Event Timeline

flimport raised the priority of this task from to Medium.Sep 12 2014, 1:47 AM
flimport added a project: Architecture.
flimport set Reference to fl580.
daniel set Security to None.
daniel moved this task from P1: Define to Under discussion on the TechCom-RFC board.
daniel moved this task from Under discussion to Old on the TechCom-RFC board.
daniel subscribed.

This has a legal and political dimension that needs some more thought. I'll collect some concrete questions/next steps over the next days and weeks.

@Qgil, can you explain in more details what you're thinking when you say,

"In the interest of... enhancing account security". I'm not sure I'm seeing the security benefit from this. If anything, I see a strong weakening of security-- a user who's password is guessed suddenly gives the attacker access to all their checkuser data.

If a user sees that "they" have been making edits that they didn't make, I'm not sure knowing the IP address where it came from would affect their response.

Hi,

I took a look at the RfC and while I personally appreciate the motivation for the proposal and am not fundamentally opposed to it, I think there are some important aspects that haven't been discussed yet. I have left more detailed comments on the RfC talk page about:

Again, I think none of this is insurmountable, but it seems to me that this is a pretty big step and should not be rushed.

PS: It's not clear to me how much the proposal was also motivated by a desire to provide better scrutiny on the work of checkusers, but I do think it might impact their work in some (not necessarily bad) ways. So they should be considered to be stakeholders here; I have notified the Checkuser-l mailing list.

  • What problem are we solving?
  • What information are we providing that cannot be sourced by the user currently on the internet?
  • What are the benefits of the change to the broader community?
  • Do the risks introduced outweigh the perceived benefits?
  • Is the additional set of preferences progressing in the direction that WMF has expressed as a desire to simplify Special:Preferences?
  • What problem are we solving?

Exactly my question. This sounds to me like solving a non existent problem, which could only cause other problems imo.

I also wonder what problem whe are solving.

In theory it should help very paranoic users who are afraid of using their account elsewhere (although edits or actions by someone else can be easily detected publicly, and user will not get anything else from this tool anyway). Thus I wonder if anyone really needs this option.

On the other hand, I see two serious issues:

  • problem for users whose accounts where hacked. It doesn't make sense to hack an account now, but if it will contain IPs and UAs, it will make much more sense to hack it (and as we don't use two-step verification, a keylogger will be enough to get this information). Or it will be enough to leave the session open and let a paranoic family member get the information.
  • problem for checkusers fighting long-term abuse. Obviously users who know how to hide their data will benefit from this option to check what data will be visible to checkusers, thus bringing the value of checks on them to zero.

Given such drawbacks, I am not sure this feature should be implemented as is, or at least it should be implemented with two-step verification and limited information (e.g. IP and last access date) to avoid abuse.

I join to the above users inquiring about which problem is to be solved with this proposed change. @Billinghurst raised interesting points.

The way I see it, if we provide the users with the CU data they're stored about them, this will be a direct torpedo to the boat's keel, that is, the CheckUser tool and its function as a tool to prevent abuse (see @Tbayer above). Malicious users will be given the data we use to track them down, thus, hindering our functions. We already had a couple of precedents of this, and the level of abuse was big, leading to a global WMF ban on one of these people.

I'm sure that we can find a way to satisfy the needs of some users to know what data is stored about them and our function to prevent abuse on the projects. I plea not to rush on this one and carefully consider all options before making a move.

What would people think about something much more specifically oriented towards copying what a couple other major sites have done for security reasons? I have significant concerns about just dumping every checkuser table row we have.

As an example of something I could see a good use for:


Your most recent actions come from:

IP Location
198.73.209.1 San Francisco, CA
5.63.151.148 London, England

If you notice anything unusual we recommend you immediately change your password [linked] and then log out [linked] and log back in to ensure someone other then you is unable to use your account.


As an example of something I could see a good use for:


Your most recent actions come from:

IP Location
198.73.209.1 San Francisco, CA
5.63.151.148 London, England

If you notice anything unusual we recommend you immediately change your password [linked] and then log out [linked] and log back in to ensure someone other then you is unable to use your account.


This is still enough to be harmful to a user, and this should not come without two-step verification and/or stronger requirements on passwords. Like in Google, if you have anything to hide, you can activate two-step verification to protect your data.

IPs and their locations are a kind of data you may want to hide for a number of reasons (e.g. you told to your employer or your partner that you were in Berlin but you went to London instead). At the moment getting access to an account of someone else (either by hacking it, by getting their password using a keylogger or just because they did not log out) is almost valueless. If you can get access to their IPs over the last three months, you get a quite valuable data, thus a much better protection than now is needed.

In adddition, as passwords can be just one character long now, you can try to brute force them and for at least some users you will get their IPs connected with their logins, which is definitely not acceptable.

This is far too much risk for far too little benefit. It places the user at risk, by making accessible sensitive data (as has been raised above, in some scenarios, extremely sensitive), and raises the possibility that a user could be forced to provide such data or that it could be stolen in the event of account compromise.

On the other side, it places at risk the projects, too. Right now, malicious users don't know exactly what's in about them. It only takes one slipup, or use of the wrong fake address and useragent on one account rather than another, to expose them. And they don't know for certain if an older IP is still in the database or not. This will let them know all of that, making sockpuppeting easier for determined and skilled abusers. Those are the exact people we need not to have access to such data.

The benefit is...what, exactly? The likelihood that a user will detect an account compromise using this tool is extremely remote. The only reason to compromise an account would be that you intend to use it somehow, and the minute it takes an action that the legitimate user knows they didn't do, there's a sign of trouble far clearer than an odd IP in the logs. This would actually create a reason to compromise an account quietly and sit on it, to build up a picture of the user's activity over a period of time. Right now, compromising an account and sitting on it isn't worth anything at all. You have access to the account, but if you never take any action with it, that doesn't even matter, and you run the risk they'll change the password, so you'd use it immediately. This proposal is "solving a problem" that it actually creates.

What would people think about something much more specifically oriented towards copying what a couple other major sites have done for security reasons? I have significant concerns about just dumping every checkuser table row we have.

As an example of something I could see a good use for:


Your most recent actions come from:

<snip>

A difference between Wikipedia and say, Google, is that the checkuser extension doesn't record all online activity. It doesn't do anything if someone logs in, browses the encyclopedia, fiddles around with their preferences (including viewing/changing their email address), etc.

As it is, it's easy to look at one's contributions to see if there are any unrecognized edits, no matter where they were made from, so I'm still left thinking that this is a solution to a nonexistent problem. I agree with other commenters in that if we really want to enhance security, we would insist on strong passwords and implement 2 factor authentication.

I posted this to https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Retained_account_data_self-discovery as well, but copying here.

The views expressed by Wikimedia CheckUsers in this task and on the RFC talk page are fairly representative of the established view, I think. However, my sense is that the tide is turning. To me, it seems like people are more and more:

(a) interested in what specifically sites are privately storing about them;

(b) interested in why their IP addresses seemingly must be exposed in the MediaWiki interface if they edit while logged out (cf. https://www.mediawiki.org/wiki/Requests_for_comment/Exposure_of_user_IP_addresses); and

(c) interested in account session management (seeing which sessions are currently active for their account and disabling sessions as necessary).

I posted this to https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Retained_account_data_self-discovery as well, but copying here.

The views expressed by Wikimedia CheckUsers in this task and on the RFC talk page are fairly representative of the established view, I think. However, my sense is that the tide is turning. To me, it seems like people are more and more:

I think that the views are those of people who have dealt with vandals and those trying to game the wikis, and it is the established view due to it having a level of credibility, reality through an evidence base. It is the other side of the coin to the proposal, and needs to be taken into account with the proposal. Classical risk assessment says look at the consequences of decisions prior to an implementation.

I understand that CUs can be jaundiced about long term abusers, and wishing to give LTAs no break at all. It is a small component of wiki-life, though it is an important one and one that gives CUs some mechanisms for the anti-vandal attacks. Some of these LTAs are sophisticated in their attacks, and it is the CU tool that often breaks these cases.

(a) interested in what specifically sites are privately storing about them;

Tell them and it is no different from any other website where you login, and waaaaaay less than twitter, facebook, etc, where people give their life story.

(b) interested in why their IP addresses seemingly must be exposed in the MediaWiki interface if they edit while logged out (cf. https://www.mediawiki.org/wiki/Requests_for_comment/Exposure_of_user_IP_addresses); and

Displaying in Special:preferences is not going to help them if they are logged out. Not sure of the comment and its relevance to this ticket.

(c) interested in account session management (seeing which sessions are currently active for their account and disabling sessions as necessary).

So you mean telling the when they logged in, and the articles that they edited, if any? Then maybe that should be researched and demonstrated that it is the case, and not rely on supposition. I still think that if you asked the users on what they wanted developers to work upon, they will have higher priorities than this.

I still don't see how the addition of this extra information fits in with your previous commentary about removing aspects of complexity in Special:Preferences.

Point a) seems to be the only relevant issue. The others, if desired, would
be better solved other ways that don't have the side effects that have been
discussed here.

To a), I think we do a good job telling people what we collect and
outlining how we use it. What benefit do they get by being able to see the
actual values? Even if this would let some users scratch their curiosity
itch, I would argue the potential for abuse of it outweighs the benefit to
a large enough extent that it would be borderline irresponsible for us to
provide it, unless its access was contingent on compliance with a strong
password policy and 2fa.

I think there is some benefit to users in point c). If we want to focus on
that (and show country/city of active logins, with the ability to delete
them and log those sessions out), I would support that. Although 2fa is
higher priority, since I think it will give more security value to our
users.

Unassigning myself, since I don't think there is anything I can do here right now.
From the discussion, it seems like the benefit is dubious, and there is quite a bit of risk.
I'll put this back up for triage.

tstarling claimed this task.
tstarling subscribed.

Declining after architecture committee discussion, due to rationale given by csteipp.