Page MenuHomePhabricator

Potential ambiguities in the Labs Terms of Use
Open, HighPublic

Description

The Labs Terms of Use suggests that usernames are private information:

You should not collect or store private data or personally identifiable information, such as user names, passwords, or IP addresses (“Private Information”) from the individuals using your Labs Project (“End Users”)

This means that I can only store usernames for a maximum of 30 days:

Purge, anonymize, or aggregate any Private Information you store no more than 30 days after storing it;

It also says that if I collect "Private Information" then I have to show a big disclaimer:

If my tools collect Private Information...
If you collect any Private Information from End Users, you must display this disclaimer to the End Users before you collect the Private Information:
By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

There is ambiguity between different users, for example on IRC chasemp said:

<tom29739> If I collect a user's username using OAuth in a tool, then do I need to show this disclaimer: https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use#If_my_tools_collect_Private_Information... before sending the user off to authenticate?
<tom29739> The Labs Terms of Use suggests that usernames are private info.
<chasemp> tom29739: is this in Tools or other?
<tom29739> In tools.
<chasemp> tom29739: it's a good question to which we can dig up a legally satisifying answer but my experience is 'no'
<tom29739> chasemp, I thought that, but here: https://meta.wikimedia.org/wiki/Steward_requests/Miscellaneous#Request_to_approve_OAuth_consumer_for_Citation_Hunt_v1.0 the question was asked
<chasemp> tom29739: the path forward in tools is essentially WMF is the proprietor and sets the privacy policy, other projects it's on the admins to do so, but afa the username for an oauth user being exposed on its own I don't think so. Let's make a task tho and ping legal if there is ambiguity?

However here: https://meta.wikimedia.org/wiki/Steward_requests/Miscellaneous#Request_to_approve_OAuth_consumer_for_Citation_Hunt_v1.0
it is suggested by MarcoAurelio that collecting data from users is not compliant with the Labs ToU.

It would be good to get clarification on this, because if collecting usernames is against the ToU then many tools will have to be updated e.g. Quarry (https://quarry.wmflabs.org/)

Event Timeline

tom29739 created this task.Jul 15 2016, 6:01 PM
Restricted Application added subscribers: Zppix, JEumerus, Aklapper. · View Herald TranscriptJul 15 2016, 6:01 PM
ZhouZ moved this task from Backlog to Assigned on the WMF-Legal board.Jul 18 2016, 10:39 PM
ZhouZ added a subscriber: ZhouZ.

If Quarry is only storing "Published queries", "Starred Queries" and "Draft Queries" for each user, then I believe it is compliant. Those items are clearly associated with my username in my profile. I suppose a help page could say "stuff stored in your profile is stored in this tool db linked to your username". However in the database schema I see user groups are also possible, and that isnt exposed in the UI profile for me (maybe it is only visible when I am in a group?).

For CitationHunt, I cant see any username related information being stored, however I do see a stats module which stores information which is very obviously PPI and tools should not be collecting without a very, very good reason.

https://github.com/eggpi/citationhunt/blob/master/chdb.py#L96
https://github.com/eggpi/citationhunt/blob/0c625fb39f64c799bc69ef7558f50e4b5501d4bd/handlers/stats.py#L32

tom29739 added a comment.EditedJul 19 2016, 12:23 AM

Quarry would be non compliant because the ToU classes usernames as private information. The ToU states that you *must* show this disclaimer before collecting the private information (in this case the username):

By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

Quarry would be non compliant because the ToU classes usernames as private information. The ToU states that you *must* show this disclaimer before collecting the private information (in this case the username):

By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

Ah, true. In that case, it shouldnt create the profile until after the user has agreed to the username being stored in Quarry.
I think an exception should be made for using the WMF username to create an local account in the tool. This is essentially storing "This username has used this tool".

However Quarry actually does much more than that. It associates all queries a user performs with their username , and shows it to the public at https://quarry.wmflabs.org/query/runs/all . That is definitely not expected behaviour. The UI for the query creation page says the query is a draft, and people do not typically mean "published for the world to see" when they say "draft" (of course in wikimedia world, we do, but that should always be explicitly stated).

The UI warning is currently:

By running queries you agree to the Labs ToS and you irrevocably agree to release your SQL under CC0 License.

IMO it should be

By submitting a query to be executed, you agree to the Labs ToS and you irrevocably agree to release your SQL under CC0 License.
After a query is submitted, it will be published on https://quarry.wmflabs.org/query/runs/all , which is publicly accessible.

That clarifies that the 'submit' is the trigger rather than 'run', as 'run' is ambiguous (at best) wrt invalid SQL. (In technical terms, invalid SQL is parsed/prepared, and is not executed).

A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.

(And does that mean every OAuth-editing tool should add that banner?)

IMHO, username shouldn't be private information. Rather, the association of a username and other private information (IP address, User agent) should be private.

IMHO, username shouldn't be private information. Rather, the association of a username and other private information (IP address, User agent) should be private.

I agree, in it's current state anything that uses the user's username has to show a big disclaimer, get rid of it after 30 days and get the user's permission to store and use the username.

chasemp triaged this task as High priority.Jul 25 2016, 2:10 PM

Thanks for creating this task @tom29739.

I will try to clarify this in the upcoming draft of the revised Labs Terms of Use. I think in this instance, username might not be categorized as the same type of private information as IP addresses, passwords, etc...

Additionally, can someone clarify this statement:

A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.

The tool used by a particular user, via OAuth, are publicly available information? If so, is this desired behavior (e.g. are there circumstance where a user might not want that fact they are using a tool to be disclosed)?

I don't think that's the case. The user using OAuth via a tool will have their username disclosed to the tool though (subject to the usual private info restrictions at present).

Additionally, can someone clarify this statement:

A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.

The tool used by a particular user, via OAuth, are publicly available information? If so, is this desired behavior (e.g. are there circumstance where a user might not want that fact they are using a tool to be disclosed)?

I don't think that's the case. The user using OAuth via a tool will have their username disclosed to the tool though (subject to the usual private info restrictions at present).
In T140486#2494436, @ZhouZ wrote:
Additionally, can someone clarify this statement:
A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.
The tool used by a particular user, via OAuth, are publicly available information? If so, is this desired behavior (e.g. are there circumstance where a user might not want that fact they are using a tool to be disclosed)?

Ok in that case, going back to the second question - while a username might not be categorized as private information like IP addresses and passwords per se, should we still treat them with sensitivity in some cases since users might want to keep their association with particular tools confidential? That is even if we don't categorize usernames as information that must be purged or anonymized after 30 days, is this information still sensitive enough to warrant warning the user about the possibility of disclosure as covered in the current "If my tools collect Private Information..." disclaimer notice?

Ok in that case, going back to the second question - while a username might not be categorized as private information like IP addresses and passwords per se, should we still treat them with sensitivity in some cases since users might want to keep their association with particular tools confidential? That is even if we don't categorize usernames as information that must be purged or anonymized after 30 days, is this information still sensitive enough to warrant warning the user about the possibility of disclosure as covered in the current "If my tools collect Private Information..." disclaimer notice?

I think that's appropriate, but maybe a bit overkill for just a username (it is rather a large notice)

Perhaps it would make sense to have a separate notice, given that the username isn't as confidential as say, an IP address or similar

Additionally, can someone clarify this statement:

A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.

The tool used by a particular user, via OAuth, are publicly available information? If so, is this desired behavior (e.g. are there circumstance where a user might not want that fact they are using a tool to be disclosed)?

Additionally, can someone clarify this statement:

A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used this tool", and there's no way for the tool at all to change that fact.

The tool used by a particular user, via OAuth, are publicly available information? If so, is this desired behavior (e.g. are there circumstance where a user might not want that fact they are using a tool to be disclosed)?

That is only if the tool edits for the user.

Quarry is a good example of an OAuth tool that does not edit; instead it uses the username to a) prevent anonymous abuse, and b) log the users activity in the tool publicly.

It is likely that someone will, without knowing, disclose their own PII using Quarry , possibly even only a draft query, which they cant delete. The drafts and run logs data will be mined to find out information about the author of the draft/query, and someone outed on a Wikipedia Review-ish type website as a result. Please at least ensure the users were warned adequately.

The OAuth authorisation screen says the tool will have access to the username, and the tool can edit on the users behalf. The user has the option of refusing that.
What is missing is how that username will be used *if* it is used other than for editing.
The other typical usage of the username is building a user profile, in which case the user must (best practise) be able to access and delete all of that profile information, to be in compliance with EU regulations.

Also if many tools publish the username (like Quarry does), how do Oversighters go about hiding that username? (e.g. it is a minor, to the username is potentially libelous) On wiki, we can rename the user to a random string to mostly remove the username. Do Oversighters then need to go to each tool to request they rename the user in their database? Huge sigh.

Thanks for this feedback.

So what I am hearing is the following: User names are somewhat less sensitive and private than standard personal identifiable information. However, we still probably want policies to address how labs developers should provide notice to user about their collection and use of username data.

Also if many tools publish the username (like Quarry does), how do Oversighters go about hiding that username? (e.g. it is a minor, to the username is potentially libelous) On wiki, we can rename the user to a random string to mostly remove the username. Do Oversighters then need to go to each tool to request they rename the user in their database? Huge sigh.

My sense is that we should not have Oversighters deal with this in the first instance (assuming this is even practical). We instead should put the onus on developers to respect our community policies. WMF provide Labs as a platform for open-source/free-knowledge developers to build cool and interesting tools without the bounds of the Labs TOU. I do not think it is a scalable or community appropriate solution for WMF or administrators to go around labs projects to actively fix potentially problematic content (which is not limited to bad usernames).

Framawiki added a subscriber: Framawiki.
Cirdan added a subscriber: Cirdan.Sep 7 2018, 5:42 PM
bd808 added a subscriber: bd808.Nov 7 2018, 6:03 PM