Page MenuHomePhabricator

Publish an analysis of the suppression of selected user_properties in 11/2016
Closed, DeclinedPublic

Description

According to an email to Labs-l https://lists.wikimedia.org/pipermail/labs-l/2016-November/004789.html, the following has been removed from database replicas:

user_properties: language, skin, timecorrection, varient

This is a request for the risk analysis to be published on-wiki as to why this suppression was necessary, in the light of this never been shown to represent a risk before. There is a protected task T150679 which has been quoted as relevant. Presumably past versions of this same data can be mined from the public data dumps.

Event Timeline

Fae created this task.Nov 29 2016, 4:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 29 2016, 4:22 PM
Aklapper renamed this task from Publish an analysis of the suppression of selected user_properties to Publish an analysis of the suppression of selected user_properties in 11/2016.Sep 10 2018, 9:45 AM
Aklapper added a project: Cloud-Services.

The bug is public now.

But to summarize there are 2 points:

  • labs should not reveal info that MW keeps secret. The wiki does not expose these, and does not say they are public (unlike gender preference). Much privacy and security problems are not about the info itself, but caused by people assuming something is secret when it really isnt, and behaving as if it was. Consistency helps users make their own risk analysis, as every user has different risks
  • language, variant, and (especially) timecorrection can be used to help determine where a user is located. Well obviously someone's language isn't a gps coordinate, it would still be helpful to someone trying to stalk someone. Skin isn't all that sensitive and comes more under the consistency point above. However we dont know what users consider sensitive. Maybe someone is applying for a job working on WMF design team and doesnt want to admit s/he is a die hard monobook fan, etc.

Does that answer your question.

Fae added a comment.Sep 12 2018, 6:47 AM

No my two years old question has not been answered. There has been no analysis published.

This was not an urgent security driven change. Based on the reply here, it looks more like someone woke up one day and thought it would be a good idea to suppress more stuff, because, "security".

If these user_properties actually were being used for malicious stalking, and there had been real cases where this was the root cause, then fine, let's say that and at least say how many incidents there have been. My guess, it's zero.

If we want to "know what users consider sensitive", then have a survey or run a public consultation, let's not just make it up and guess what users want because consultation feels like it's harder to do.

Aklapper closed this task as Declined.EditedSep 12 2018, 9:15 AM

No my two years old question has not been answered. There has been no analysis published.

I see. In that case I'm boldly declining this task. (The information which is known and available is there; if that's not sufficient or enough for some folks, so be it unfortunately.)

If these user_properties actually were being used for malicious stalking, and there had been real cases where this was the root cause, then fine, let's say that and at least say how many incidents there have been. My guess, it's zero.

We don't wait until an attack happens to fix things.

To clarify some things:

No my two years old question has not been answered. There has been no analysis published.

The delay was because this task was filed with the wrong tag. Security-General was a hold over from bugzilla that nobody was watching (at the time). Obviously the fact that this tag exists that nobody watches is why last week we worked on archiving the tag, which brought this bug to light. (I know you're unhappy about other similar requests being ignored. Those have more to do with a large amount of beurocracy related to incidents. This is not an incident, so that doesnt really apply)

This was not an urgent security driven change. 

Indeed it is not urgent like certain other changes are. However we still fix issues like this before publishing it because the possibility exists that there is sensitive info present and we dont want to point everyone in the world at it.

Presumably past versions of this same data can be mined from the public data dumps.

user_options table is not dumped at download.wikimedia.org

If we want to "know what users consider sensitive", then have a survey or run a public consultation, let's not just make it up and guess what users want because consultation feels like it's harder to do.

I dont think its a stretch to assume that data easily linked to someone's geographic location is considered sensitive to a subset of the population. However, even if that is not true - the other reason for redacting it still stands: mediawiki does not communicate to the user that this information will be public (unlike say gender preferences) and mediawiki does not release the information itself. Having different systems having different definitions of what is secret information is a recipe for making mistakes, thus we are working on making mediawiki be the controlling definition of what is public and what is private information. If you get someone to change mediawiki to make this information public, then we will make the same change in tool labs WikiReplicas- but we are not going to have this info in wikireplicas if you cannot get it from mediawiki itself.

.