Page MenuHomePhabricator

Request to review privacy policy and rules
Closed, DeclinedPublic

Description

Hello,

I ported WikiSpy to my WikiLabs instance and it is ready to launch as beta/alpha - my plan is to show it to the world and hopefully attract people who would help me to improve the front-end.

Anyway, before I start advertising the website on reddit etc, I would like to ask for a review of my privacy policy and rules in accordance to WikiLabs rules. I would prefer to keep these documents minimal, but at the same time keep it all legally okay. Could somebody take a look at this and say if I need to add/change anything?

Thanks,
d33tah

Event Timeline

d33tah assigned this task to yuvipanda.
d33tah raised the priority of this task from to Medium.
d33tah updated the task description. (Show Details)
d33tah added a project: Cloud-Services.
d33tah added subscribers: jeremyb, Ricordisamoa, Sitic and 7 others.

A quick ping - I posted this five days ago and got no response. If noone speaks out until tomorrow, I will make a mailing list post.

Andrew set Security to None.

@d33tah, can you please attach said policy, or provide a link?

What does "additional terms" mean?

What statistics do you have in mind? Can you run your stats, aggregation, etc. regularly and then toss the raw input when those have been generated? or do you intend to keep the IPs forever?

Unless somebody suggests me how can I implement the current functionality without keeping the IPs forever, I think that I would have to. As for additional terms - this is copied from en.wikipedia.org footer.

As for the statistics, right now I am recording the IPs in the database in order to mark who watched which article in order to count unique views. Removing the IP would lead to the risk that I would count the same user twice.

You'll need to be considerably more precise in your privacy policy. You need to enumerate precisely what data you collect, how long you keep it, and how you will be using it.

In addition, as stated in the Labs Terms of Use, you need to get positive acknowledgement from the user about this before any data is collected.

Thanks coren! I will update the privacy policy to explain that (can I keep the IP list indefinitely?) and modify my program to ask whether it is okay to track unique views. Once I do that, I will just let you know in this ticket again, okay?

As for the statistics, right now I am recording the IPs in the database in order to mark who watched which article in order to count unique views. Removing the IP would lead to the risk that I would count the same user twice.

Unless usage statistics are the main part of your app, I'd say: just drop them.

In a way they are. I wanted to avoid authentication etc., but at the same time I plan to show a lot of potentially useless output to the users, so I need some way to tell interesting content apart. I decided to go for counting unique IPs because I expect this to be difficult to abuse - what do you think about it?

I still don't understand exactly why/how you're using IPs or making logs unique. So therefore also don't see why people would abuse it or even what abuse would mean, how that would benefit them or harm you.

How about using short lived (12-48 hrs?) cookies with random IDs or UUIDs?

Here's my thinking: I want a system that lets me separate interesting content from unimportant one. I assume that some of the things users will find will be unconvenient to third parties that would prefer that the users would not see those (think of NYPD editing articles about their brutality). For this reason, I decided to avoid the upvote/downvote system. I also didn't want captchas. Instead, I created a system where I keep track of unique IPs visiting the pages. This way, if someone wants to "hide" a particular edit, he needs to add random noise to a lot of other articles. For that, he needs to control the number of IPs proportional to the popularity of the entry he wants to hide.

Short-lived cookies wouldn't help there because you could always remove them and get another ones, which would let you add the noise I described above.

ok, so your content is partly crowdsourced.

some providers make it very easy to cycle through IPs so that's not a
silver bullet for abuse.

I've been thinking over the last few comments/tickets (sorry I didn't
mention this earlier) that maybe labs isn't a good fit for you and at least:

You need to be aware that there's no SLA, much less support (i.e. less
people to fix things in an emergency) than the WMF production cluster, no
backups of your data. (you'll need to do your own backups on your own infra)

Also, how will your collected content be licensed? How will it be
preserved/continued if you move on to another project? Will you regularly
publish dumps of your collected data?

(and how will continuity to pass on to future maintainers fit in with
privacy policy?)

I know it's not a silver bullet, but it would definitely make any attacks more difficult.

I make it easy to replicate my project in order to create separate websites. I do not plan to share the collected IP addresses and I have no problems with the lack of backups - most of my data can be reproduced based on Wikipedia dumps. If by SLA you mean Service Level Agreement, I don't mind the fact that there are no guarantees.

As for the dumps, I never thought of this - the only content that being created there are the popularity statistics,

About the continuity - would it be enough to mention that the data is collected for tracking unique views? Or is there any other statement that I need to make that would cause problems that I need to address in the privacy policy when transferring the maintenance to somebody else?

what about your historical RDNS DB?

anyway, re SLA, part of that means that we generally don't host sites that
will end up getting slashdotted.

and I get the impression that you are expecting this site to eventually get
regular surges of traffic. maybe not every day, but at least once per
month. is that right?

Yes, the last version of the website got 10k unique views within two days - I assumed that this is okay. The rDNS DB is taken from Solar7 research, at most I could share the rDNS queries for edits I discovered using the IRC bot.

Does that mean I should be looking for a new hosting then?

Yes, the last version of the website got 10k unique views within two days - I assumed that this is okay.

so then what if your site was broken for a day? 3 days?

what if your instance's disk is lost and you have to start a new instance
from scratch?

are you ok with all of that?

The rDNS DB is taken from Solar7 research, at most I could share the rDNS queries for edits I discovered using the IRC bot.

could you expand on this?

are you saying you couldn't release some of it because you wouldn't have
the rights to do so?

Yes, the last version of the website got 10k unique views within two days

  • I assumed that this is okay.

so then what if your site was broken for a day? 3 days?

what if your instance's disk is lost and you have to start a new instance
from scratch?

are you ok with all of that?

I am. I can recover the website within a few hours and I think of it mostly as of a website that gives users access to some research/entertainment - if it breaks or disappears, at worst people would have to do the research a different way. It's not going to be hooked to anybody's life support with the assumption that it has to respond within five seconds at every request. If I happen to suck at maintenance and the project gains popularity, I have nothing against somebody hosting it on its own somewhere else.

The rDNS DB is taken from Solar7 research, at most I could share the rDNS

queries for edits I discovered using the IRC bot.

could you expand on this?

are you saying you couldn't release some of it because you wouldn't have
the rights to do so?

I have two rDNS sources: one is Solar7 offline database, which is used for pre-seeding the database from a Wikipedia dump. Once pre-seeding is done, I run an IRC bot that sits on Wikimedia IRC notification channels and looks for anonymous edits. The rDNS for those edits is looked up in real time and this is something that I could actually share.

Hi @d33tah, thank you for your contributions on WikiLabs. The WMF legal team has reviewed your issue and has some thoughts in regards to your questions.

As you may know, user privacy is very important to the Wikimedia community and so the Terms of Use on WikiLabs (https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use) ask that developers collect and retain only the minimum of user information as necessary for their projects. As noted above by @jeremyb-phone, it is not readily clear that retaining IP addresses indefinitely for unique visitor counting will be much more effective than less intrusive methods. As we noted in the Terms of Use, private End-user data such as IP addresses should not be collected unless there is a pressing need for the functionality. Nevertheless, if WikiSpy were to proceed with its plan to collect IP addresses of visitors, Wikispy should follow these steps below based on the conditions set forth in the Terms of Use:

  1. WikiSpy should have a well-developed privacy policy. Note each developer is responsible for drafting, posting, and following their own privacy policy but at the very least this privacy policy should conform to the WikiLabs Terms of Use and should fully disclose, at a minimum, how the private End-user data (in this case IP addresses) is being collected, how long such data is kept, and how this data will be used.
  1. In particular, the WikiLabs Terms of Use require developers to "purge, anonymize, or aggregate" private End-user information like IP addresses 30 days after they are stored. If WikiSpy wants to retain IP addresses indefinitely, WikiSpy should explore how it might best follow this requirement.
  1. As specified in the Terms of Use, WikiSpy will need to add the following disclaimer to its users:

By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

  1. WikiSpy should clearly note that it is a project independent of the Wikimedia Foundation. We recommend you add the following disclaimer: "This project is independent of the Wikimedia Foundation."

Let us know if you have any further questions.

Best regards,

Zhou

Yes, the last version of the website got 10k unique views within two days - I assumed that this is okay.

so then what if your site was broken for a day? 3 days?

what if your instance's disk is lost and you have to start a new instance
from scratch?

are you ok with all of that?

Since you hadn't replied to my previous answer to the post, I would like to ask you - if I am okay with all of the above, can I host this website on labs and hope that I won't get banned because of Slashdot-like traffic surges?

Hi @d33tah, thank you for your contributions on WikiLabs. The WMF legal team has reviewed your issue and has some thoughts in regards to your questions.

As you may know, user privacy is very important to the Wikimedia community and so the Terms of Use on WikiLabs (https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use) ask that developers collect and retain only the minimum of user information as necessary for their projects. As noted above by @jeremyb-phone, it is not readily clear that retaining IP addresses indefinitely for unique visitor counting will be much more effective than less intrusive methods. As we noted in the Terms of Use, private End-user data such as IP addresses should not be collected unless there is a pressing need for the functionality. Nevertheless, if WikiSpy were to proceed with its plan to collect IP addresses of visitors, Wikispy should follow these steps below based on the conditions set forth in the Terms of Use:

  1. WikiSpy should have a well-developed privacy policy. Note each developer is responsible for drafting, posting, and following their own privacy policy but at the very least this privacy policy should conform to the WikiLabs Terms of Use and should fully disclose, at a minimum, how the private End-user data (in this case IP addresses) is being collected, how long such data is kept, and how this data will be used.
  1. In particular, the WikiLabs Terms of Use require developers to "purge, anonymize, or aggregate" private End-user information like IP addresses 30 days after they are stored. If WikiSpy wants to retain IP addresses indefinitely, WikiSpy should explore how it might best follow this requirement.
  1. As specified in the Terms of Use, WikiSpy will need to add the following disclaimer to its users:

By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

  1. WikiSpy should clearly note that it is a project independent of the Wikimedia Foundation. We recommend you add the following disclaimer: "This project is independent of the Wikimedia Foundation."

Let us know if you have any further questions.

Best regards,

Zhou

Thank you Zhou! I will work on making my website comply with those requirements.

Hi @d33tah, thank you for your contributions on WikiLabs. The WMF legal team has reviewed your issue and has some thoughts in regards to your questions.

As you may know, user privacy is very important to the Wikimedia community and so the Terms of Use on WikiLabs (https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use) ask that developers collect and retain only the minimum of user information as necessary for their projects. As noted above by @jeremyb-phone, it is not readily clear that retaining IP addresses indefinitely for unique visitor counting will be much more effective than less intrusive methods. As we noted in the Terms of Use, private End-user data such as IP addresses should not be collected unless there is a pressing need for the functionality. Nevertheless, if WikiSpy were to proceed with its plan to collect IP addresses of visitors, Wikispy should follow these steps below based on the conditions set forth in the Terms of Use:

  1. WikiSpy should have a well-developed privacy policy. Note each developer is responsible for drafting, posting, and following their own privacy policy but at the very least this privacy policy should conform to the WikiLabs Terms of Use and should fully disclose, at a minimum, how the private End-user data (in this case IP addresses) is being collected, how long such data is kept, and how this data will be used.
  1. In particular, the WikiLabs Terms of Use require developers to "purge, anonymize, or aggregate" private End-user information like IP addresses 30 days after they are stored. If WikiSpy wants to retain IP addresses indefinitely, WikiSpy should explore how it might best follow this requirement.
  1. As specified in the Terms of Use, WikiSpy will need to add the following disclaimer to its users:

By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.
By using this project, you agree that the volunteer administrators of this project will have access to any data you submit. This can include your IP address, your username/password combination for accounts created in Labs services, and any other information that you send. The volunteer administrators of this project are bound by the Wikimedia Labs Terms of Use, and are not allowed to share this information or use it in any non-approved way.
Since access to this information is fundamental to the operation of Wikimedia Labs, these terms regarding use of your data expressly override the Wikimedia Foundation's Privacy Policy as it relates to the use and access of your personal information.

  1. WikiSpy should clearly note that it is a project independent of the Wikimedia Foundation. We recommend you add the following disclaimer: "This project is independent of the Wikimedia Foundation."

Let us know if you have any further questions.

Best regards,

Zhou

Zhou,

Could you review whether the proposed privacy policy is good enough?

http://wikispy.wmflabs.org/privacy

Cheers,
d33tah

I tried contacting @ZhouZ privately but he didn't reply. Is there anybody else I should ask?

Hi @d33tah,

Apologies for the late reply. Your comment on the Phabricator ticket came while I was away out of the office without access to email (and very unfortunately), you email below landed in my spam filter. I just saw it yesterday.

As a note before I continue, while the Wikimedia Foundation very much value the contributors of volunteers like yourself to its projects, I am an attorney for the Wikimedia Foundation and not you. As such, I can only offer legal advice on behalf of the Foundation and not yourself. Furthermore, it is the general policy of the Foundation not to draft privacy policy for its lab project users. If you feel you need legal representation, please consult and hire legal counsel of your choosing.

Having said that, thank you for responding to our feedback regarding your proposed project. While you have followed some requirements of the Labs' Terms of Use above, the project still does not (at least according to your privacy policy) follow the requirement in the Labs' Terms of Use that asks developers to "purge, anonymize, or aggregate" private End-user information like IP addresses 30 days after they are stored.

Given this is project appears to be still in Alpha, we understand you might still be experimenting with your implementation. Nonetheless, we ask you to provide us with a firm timeline indicating that you will modify the project within the next 30 days to fix this issue. If you fail to do so or cannot do so, we will unfortunately have to suspend your project for a failure to comply with the Labs' Terms of Use.

Let me know if you have further questions.

Thanks again for your understanding,

Zhou

@ZhouZ: I added a cron job that should handle the purging on the first day of every month, is that okay? Is the rest of the privacy policy good enough?

Hi @d33tah,

Yes, if you are purging the private end-user information such as IP addresses every month that should comply with the requirement above. Please also update your privacy policy accordingly.

As for the rest of your privacy policy, I notice you have followed the guidelines I had listed previously so thank you for that. As I have noted above, I cannot provide legal advice for you and your project (including the drafting of privacy policies) as I am not your lawyer but an attorney for the Wikimedia Foundation.

Even though I am not your lawyer, based on what you have told us about project, we have no further objections to your privacy policy as it appears to be in compliance with our previous concerns about your project. Thank you for reaching out to us and your continued diligence is appreciated as it is ultimately your responsibility to check that your privacy policy and your project practices are in compliance with the Labs' Terms of Use.

Your project looks really cool and I wish you the best of luck for your future work on it.

Best,

Zhou

@ZhouZ: Thank you! I think I will launch it soon then, I would just like to update my database first.

@d33tah there is a banner that says Note: this website is in alpha stage so far and might stop working anytime. still on this site. Nothing this is from almost a year ago, is this still an active project?

W dniu 29.06.2016 o 22:43, chasemp pisze:

chasemp added a comment.

@d33tah https://phabricator.wikimedia.org/p/d33tah/ there is a banner that
says Note: this website is in alpha stage so far and might stop working anytime.
still on this site. Nothing this is from almost a year ago, is this still an
active project?

It's not really an active project. I was waiting for help with my domain
and so on and got tired of pinging admins, so I gave up.

fnegri subscribed.

The WikiSpy project has been deleted a long time ago (T97846).