Fri, Jun 4
Part of this task is to make data releases of this type part of the cycle of data releases at WMF so I do not think we should pursue the option of treating this project like a one off data release, rather we should think of it running it as any other data flow as a core requirement.
Thanks @Isaac and @Nuria for the in-depth discussion of the relative pros and cons of these two approaches, and for the deep dive on user-side filtering. I wanted to chime in with some more context that I recently learned about putting DP into production, regardless of how we filter/limit pageviews. These considerations may be relevant as we move toward creating this as a service.
Thu, Jun 3
It's working perfectly! Thanks so much for the responsiveness.
Hi all — reopening this task so that I can get access to https://superset.wikimedia.org as a Hive GUI.
Wed, Jun 2
@JBennett tagging you to flag that you need to sign off on this
May 21 2021
May 3 2021
Hi all — just finished updating the demo to get it into a good place. You can see the finished product (UI, user- and pageview-level privacy, etc.) at https://diff-privacy-beam.wmcloud.org. Please let me know what you think, and if there are any next steps that any of you can see toward getting this into a production prototype. Thanks for all the help so far :)
Apr 30 2021
@TedTed, thanks for explaining thresholding and why δ is necessary, even with Laplace noise. Really useful to know what's happening under the hood of Privacy on Beam.
Apr 21 2021
- You mention processing 500,000 rows in the README. Am I correct in assuming this is the process: 1) gather top-50 viewed articles from API for that language, 2) de-aggregate the data and load into database so that e.g., an article with 50,000 pageviews becomes 50,000 separate rows, 3) extract the data and run through the diff-privacy framework (any filtering + addition of noise), 4) return privacy-aware counts.
Apr 20 2021
Just wanted to give you a quick status update — I have a somewhat functional re-implementation of @Isaac's tool using Golang/Beam up and running locally. I'm still working on getting it working/hosted in Toolforge (which doesn't as a service play very nicely with Go quite yet), but I'm hoping that should be done this week.
Apr 16 2021
Hi all — I'm Hal Triedman, the new Privacy Engineering intern. Over the last few days, I've been working on re-implementing the tool that @Isaac made (https://diff-privacy.toolforge.org) using Go's Apache Beam SDK and Google's Privacy on Beam package, rather than Python, Flask, and hand-coded DP functions.