Page MenuHomePhabricator

Htriedman (Hal Triedman)
Privacy Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 5 2021, 8:13 PM (64 w, 10 h)
Availability
Available
LDAP User
Htriedman
MediaWiki User
HTriedman (WMF) [ Global Accounts ]

Recent Activity

Wed, Jun 22

Htriedman added a comment to T310341: Add some columns of `renameuser_queue` to the replica.

Hi all!

Wed, Jun 22, 6:38 PM · Data-Services, Stewards-and-global-tools, cloud-services-team (Kanban)

May 11 2022

Htriedman updated subscribers of T307245: Swift for differential privacy data publication.

@Milimetric Thanks for the pointers on this process! I also just talked to @gmodena and think that we're starting to come to a good set of solutions for how we might put together the disparate pieces of this project. I'll be sure to keep you all in the loop.

May 11 2022, 7:04 PM · Data-Persistence, SRE-swift-storage, Privacy Engineering, Data-Engineering

May 2 2022

Htriedman added a comment to T295072: Install spark3 in analytics clusters.

@Ottomata Thanks for this update! The differential privacy project is currently using a jerry-rigged version of Spark 3 to run our software packages, so please let me know (either in this thread on phab or via slack) when you've been able to install Spark 3 on anaconda-wmf.

May 2 2022, 6:39 PM · Airflow, Data-Engineering

Apr 29 2022

Htriedman created T307245: Swift for differential privacy data publication.
Apr 29 2022, 5:51 PM · Data-Persistence, SRE-swift-storage, Privacy Engineering, Data-Engineering

Apr 20 2022

Htriedman created T306551: Request to delete tumult_temp_<date>_<random ID> tables/schemas from analytics cluster.
Apr 20 2022, 4:18 PM · Data-Engineering

Apr 4 2022

Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore! Hope you're well — I'm done with my privacy review and am hoping to share it with you soon (I just need your email)

Apr 4 2022, 9:50 PM · Data-Engineering-Radar, Privacy Engineering, Privacy
Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore working on this now, hopefully I'll have it done in the next 24h!

Apr 4 2022, 3:57 PM · Data-Engineering-Radar, Privacy Engineering, Privacy

Mar 29 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Great! I'll be sure to circle back to this in a month or two with some updates.

Mar 29 2022, 5:28 PM · Privacy Engineering, Data-Engineering

Mar 28 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Thanks so much for getting back to me on this with some more information. We're currently in the middle of establishing protocols and processes around the use of differential privacy (and configuring software!), which should be done by the end of Q4 (June 2022). This data release is definitely possible within certain privacy bounds — if we can wait until then. If not, I can also potentially suggest some other mitigation heuristics.

Mar 28 2022, 3:53 PM · Privacy Engineering, Data-Engineering

Mar 25 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Hi @JAllemandou — does this pageview data exist in a private table somewhere stripped of the actor_signature field? Or is it preaggregated somehow? This could be a case where differential privacy (which we are currently piloting on similar data) could come in handy.

Mar 25 2022, 3:33 PM · Privacy Engineering, Data-Engineering

Mar 22 2022

Htriedman added a comment to T301391: Update click tracking to take into account screen resolution.

The privacy team has conducted a review of the proposed data collection scheme — without mitigations, it would be deemed medium risk. However, after privacy-protecting mitigations like automated data deletion after 90 days, bucketing, and restricting access to WMF/NDA'd people, collecting this data was deemed low risk.

Mar 22 2022, 7:52 PM · MW-1.39-notes (1.39.0-wmf.7; 2022-04-11), Patch-For-Review, Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2021-22), MediaWiki-extensions-WikimediaEvents

Mar 15 2022

Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore! I'm Hal, a privacy engineer at WMF, and I'll be taking a look at this (rerunning the notebook, assessing potential harms, writing up a formal privacy review, etc.) in the next few days.

Mar 15 2022, 8:38 PM · Data-Engineering-Radar, Privacy Engineering, Privacy

Feb 28 2022

Htriedman closed T301680: Requesting access to analytics-privatedata for Kiron Lebeck (klebeck-tmlt) as Declined.

Kiron is leaving Tumult Labs soon, so this task doesn't need to be completed.

Feb 28 2022, 7:58 PM · SRE, SRE-Access-Requests

Feb 15 2022

Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

Update: they should all be in the NDA and MOU document now

Feb 15 2022, 6:51 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

@Dzahn checking in with Legal now

Feb 15 2022, 4:48 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301782: Requesting access to analytics-privatedata-users for Michael.hay.

@RhinosF1 Correct assumptions on both counts — @JBennett is the approving manager, and the expiry date is 13 September 2022

Feb 15 2022, 4:48 PM · SRE, SRE-Access-Requests

Feb 14 2022

Htriedman updated subscribers of T301679: Requesting access to analytics-privatedata for Tom Magerlein.

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman updated subscribers of T301680: Requesting access to analytics-privatedata for Kiron Lebeck (klebeck-tmlt).

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman updated subscribers of T301659: Requesting access to analytics-privatedata-users for Damiendf.

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

@RhinosF1 just checked, the expiry date is 13 September 2022

Feb 14 2022, 5:27 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

I believe the expiry should be roughly 6 months from now — let's say (for the moment, at least) 31 August 2022.

Feb 14 2022, 4:54 PM · SRE, SRE-Access-Requests

Feb 12 2022

Htriedman updated subscribers of T301581: Requesting access to analytics-privatedata for Skye Berghel.

Hi SRE team! Just a couple of clarifications here — the approving party is actually @JBennett, rather than myself.

Feb 12 2022, 5:52 PM · SRE, SRE-Access-Requests

Jan 26 2022

Htriedman updated subscribers of T300173: Add Htriedman (Hal Triedman) to #wmf-nda.

Also subbing my direct supervisor @Jcross

Jan 26 2022, 5:57 PM · WMF-NDA-Requests
Htriedman added a comment to T300173: Add Htriedman (Hal Triedman) to #wmf-nda.

LDAP and SUL accounts are both linked to this account, let me know if anything else needs to be done on this front!

Jan 26 2022, 5:49 PM · WMF-NDA-Requests

Jan 19 2022

Htriedman added a comment to T287527: Outlinks model card .

@calbon sounds good — I'm also in the middle of putting the information spread across all the outlinks-model-related pages into my model card content v0.2 doc. That document is on google docs, but should be publicly accessible at this link. Looking forward to talking about it next week!

Jan 19 2022, 6:37 PM · ML-Governance, Lift-Wing, Machine-Learning-Team (Active Tasks)

Dec 14 2021

Htriedman updated Htriedman.
Dec 14 2021, 6:07 PM

Nov 3 2021

Htriedman added a comment to T294970: Requesting access to restricted for htriedman.

Totally understand. Thanks for the tips!

Nov 3 2021, 9:56 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T294970: Requesting access to restricted for htriedman.

Hi @Urbanecm, thanks for the quick response and the helpful pointer. I've been able to get into centralauth by running analytics-mysql centralauth, and can query centralauth.globaluser. I must've been mistaken in thinking that I need access to mwmaint — that came up as part of a discussion with one of my peers who had access to mwmaint, and I didn't realize the same data was accessible with my current user permissions. You can deny this request and close this ticket.

Nov 3 2021, 9:52 PM · SRE, SRE-Access-Requests
Htriedman created T294970: Requesting access to restricted for htriedman.
Nov 3 2021, 7:29 PM · SRE, SRE-Access-Requests
Htriedman removed a project from T245110: Should per user `Echo-whitelist` pages be protected?: Security-Team.
Nov 3 2021, 7:01 PM · Growth-Team-Filtering, Growth-Team, MediaWiki-User-management, Notifications, Security, User-DannyS712

Oct 28 2021

Htriedman added a comment to T294391: Expose mediawiki/revision/tags-change in stream.wikimedia.org.

Got it. In that case I don't see it adding any new privacy risk — I'll just make sure to bump my investigation of the frequency and severity of these incidents up on my todo list.

Oct 28 2021, 9:36 PM · Data-Engineering-Kanban, Event-Platform, Data-Engineering, EventStreams
Htriedman added a comment to T294391: Expose mediawiki/revision/tags-change in stream.wikimedia.org.

I know that this is the same theoretical attack vector as revision create, e.g. someone creates a page with a title like "Hal Triedman's SSN is XX-XXX-XXXX" that is quickly removed and suppressed, but the revision create event publicly consumable in the event stream for 7 days.

Oct 28 2021, 6:46 PM · Data-Engineering-Kanban, Event-Platform, Data-Engineering, EventStreams
Htriedman updated subscribers of T280385: Apache Beam go prototype code for DP evaluation.

Hi @Milimetric! Thanks for commenting on this task — lots has happened in the last 3-4 weeks and this served as a good reminder to update this thread as to where we currently are on this project.

Oct 28 2021, 3:52 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Oct 22 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

I have some more updates after working on the algo-accountability repo for another week:

Oct 22 2021, 11:12 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Oct 13 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Hi all! Just wanted to post a quick update on some of these ORES transparency efforts — I have (mostly) compiled a repository of datasets (~10GB), model binaries (~0.5 GB), model architectures, model training performance, etc. that are used in ORES. You can check it out at the ores-data repo on Gitlab. There were some holes where datasets/models didn't compile or were otherwise corrupted somehow, but I did my best to document what didn't work for whatever reason.

Oct 13 2021, 11:14 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team
Htriedman removed a project from T207777: audit password policy check for constant time string comparisons: Security-Team.
Oct 13 2021, 7:34 PM · Security, MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), Google-Code-in-2018, MediaWiki-User-management
Htriedman removed a project from T210790: Allow cross-origin requests by default in the Action API: Security-Team.
Oct 13 2021, 7:33 PM · Platform Engineering, Patch-For-Review, MediaWiki-Action-API
Htriedman removed a project from T214489: Improve LDAP logging: Security-Team.
Oct 13 2021, 7:33 PM · LDAP, SRE
Htriedman removed a project from T221576: Password length requirement error shown twice: Security-Team.
Oct 13 2021, 7:32 PM · MediaWiki-User-login-and-signup
Htriedman removed a project from T239077: Define policy aspects of CSP on wiki: Security-Team.
Oct 13 2021, 7:31 PM · Privacy Engineering, Documentation, Privacy, ContentSecurityPolicy
Htriedman removed a project from T248294: Separate permission for creating a page with a custom content model: Security-Team.
Oct 13 2021, 7:26 PM · Editing-team, MediaWiki-User-management, User-DannyS712
Htriedman removed a project from T255370: Document best practices for user login if user is using 2FA: Security-Team.
Oct 13 2021, 7:24 PM · MediaWiki-extensions-OathAuth, Platform Team Initiatives (API Gateway), MediaWiki-Documentation, Documentation, MediaWiki-Authentication-and-authorization, Security

Sep 30 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Just checked the datasets that are going to be made available for public release. Everything is in order and you're all set to share them outside of the NDA group. With the mitigations taken, the residual risk level that this data poses to editors is low.

Sep 30 2021, 5:02 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi everyone — I know it's been several months since this ticket has been updated, but work on implementing DP at scale in production has continued over the last several months, and I wanted to post publicly with some updates on our process:

Sep 30 2021, 2:45 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Sep 28 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Just took a look. Those files are alright to share with people who have signed NDA with the Foundation. They are not ok to share publicly, since they contain exact counts of editors and edits, rather than aggregated buckets of counts (11-20 editors instead of 14, 100-200 edits instead of 151, etc.).

Sep 28 2021, 8:14 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @GoranSMilovanovic — apologies for the confusion. I understand that you are intending to remove all informations from countries on the Country Protection List, and was trying to respond to @Manuel's follow-up question, just for the sake of learning:

Sep 28 2021, 7:50 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @Manuel — so sorry for the late response; my phabricator account was misconfigured and I didn't get a notification email. Thanks so much for getting back to me with all this information.

Sep 28 2021, 6:40 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics

Sep 27 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Hi again @Isaac! Just wanted to re-respond to your post about protected classes with a strong measure of agreement. The only feature I might add to the content analysis is geography among anonymous IP editors. At the same time, I think a lot of the features we want to measure depend in large part upon the size of the test set to prevent bad stats and data leaks (it really doesn't have to be very large; 200-400 evenly-distributed samples would likely do just fine).

Sep 27 2021, 1:14 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 24 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

End of the week update: I have officially been able to run a full model card through the pipeline! (yay) Example here (sorry for the sparsity of data, it's only running on ~50 revisions to keep testing relatively quick).

Sep 24 2021, 10:16 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 21 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @GoranSMilovanovic and @Manuel! My name is Hal — I'm a privacy engineer on the Privacy Engineering team. There is some precedent for releasing data of this variety, but I still have a couple of questions:

Sep 21 2021, 3:24 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics

Sep 16 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

@calbon you're making a good point and this is definitely a conversation worth having.

Sep 16 2021, 9:20 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 13 2021

Htriedman moved T280385: Apache Beam go prototype code for DP evaluation from Backlog to Completed on the Privacy Engineering board.
Sep 13 2021, 6:54 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Sep 10 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Quick update — I spent some time visually charting out exactly what the infrastructure/workflow for this system might look like. I've attached an image of my proposed design, and you can make comments on the design on Google Drawings here.

Sep 10 2021, 6:30 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team
Htriedman created T290746: Developing the `algo-accountability` repository.
Sep 10 2021, 4:18 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Jul 13 2021

Htriedman added a comment to T276398: Experiment with on-wiki model documentation.

@calbon good first question — yes, the model card bot should overwrite manual edits. In my mind, the canonical way to edit model cards should be by pushing some change to a config file for the card hosted on Gerrit/Gitlab. Then, changes will show up on the next run of the card generator.

Jul 13 2021, 8:38 PM · Machine-Learning-Team (Active Tasks), Documentation, artificial-intelligence
Htriedman added a comment to T276398: Experiment with on-wiki model documentation.

Hi all! I just put together a memo synthesizing what dataset/model/service documentation could look like on WMF resources. For generalizability, I've called the documentation "algorithmic accountability sheets". The memo addresses what questions we should be asking for each component of an algorithm, as well as some thoughts on governance, metrics, deprecation, etc.

Jul 13 2021, 6:17 PM · Machine-Learning-Team (Active Tasks), Documentation, artificial-intelligence

Jun 24 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

@gmodena It might also be worth considering implementing HDFS in the Beam Go SDK ourselves — the template for doing it (which you can see here, for example) is relatively simple.

Jun 24 2021, 7:55 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Jun 4 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Part of this task is to make data releases of this type part of the cycle of data releases at WMF so I do not think we should pursue the option of treating this project like a one off data release, rather we should think of it running it as any other data flow as a core requirement.

Jun 4 2021, 9:35 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release
Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Thanks @Isaac and @Nuria for the in-depth discussion of the relative pros and cons of these two approaches, and for the deep dive on user-side filtering. I wanted to chime in with some more context that I recently learned about putting DP into production, regardless of how we filter/limit pageviews. These considerations may be relevant as we move toward creating this as a service.

Jun 4 2021, 5:11 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Jun 3 2021

Htriedman closed T283368: Requesting access to production analytics data and cluster for htriedman as Resolved.

It's working perfectly! Thanks so much for the responsiveness.

Jun 3 2021, 6:08 PM · Analytics, SRE, SRE-Access-Requests
Htriedman reopened T283368: Requesting access to production analytics data and cluster for htriedman as "Open".

Hi all — reopening this task so that I can get access to https://superset.wikimedia.org as a Hive GUI.

Jun 3 2021, 5:59 PM · Analytics, SRE, SRE-Access-Requests

Jun 2 2021

Htriedman added a comment to T283368: Requesting access to production analytics data and cluster for htriedman.

@JBennett tagging you to flag that you need to sign off on this

Jun 2 2021, 3:47 PM · Analytics, SRE, SRE-Access-Requests

May 21 2021

Htriedman created T283368: Requesting access to production analytics data and cluster for htriedman.
May 21 2021, 4:17 PM · Analytics, SRE, SRE-Access-Requests

May 3 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi all — just finished updating the demo to get it into a good place. You can see the finished product (UI, user- and pageview-level privacy, etc.) at https://diff-privacy-beam.wmcloud.org. Please let me know what you think, and if there are any next steps that any of you can see toward getting this into a production prototype. Thanks for all the help so far :)

May 3 2021, 9:49 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 30 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

@TedTed, thanks for explaining thresholding and why δ is necessary, even with Laplace noise. Really useful to know what's happening under the hood of Privacy on Beam.

Apr 30 2021, 6:37 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 21 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.
  • You mention processing 500,000 rows in the README. Am I correct in assuming this is the process: 1) gather top-50 viewed articles from API for that language, 2) de-aggregate the data and load into database so that e.g., an article with 50,000 pageviews becomes 50,000 separate rows, 3) extract the data and run through the diff-privacy framework (any filtering + addition of noise), 4) return privacy-aware counts.
Apr 21 2021, 5:27 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 20 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Just wanted to give you a quick status update — I have a somewhat functional re-implementation of @Isaac's tool using Golang/Beam up and running locally. I'm still working on getting it working/hosted in Toolforge (which doesn't as a service play very nicely with Go quite yet), but I'm hoping that should be done this week.

Apr 20 2021, 10:42 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 16 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi all — I'm Hal Triedman, the new Privacy Engineering intern. Over the last few days, I've been working on re-implementing the tool that @Isaac made (https://diff-privacy.toolforge.org) using Go's Apache Beam SDK and Google's Privacy on Beam package, rather than Python, Flask, and hand-coded DP functions.

Apr 16 2021, 6:03 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 7 2021

Htriedman added a member for Privacy: Htriedman.
Apr 7 2021, 3:10 PM
Htriedman added a member for Security: Htriedman.
Apr 7 2021, 3:01 PM

Apr 6 2021

Htriedman added a member for Security-Team: Htriedman.
Apr 6 2021, 9:11 PM
Htriedman added a member for Privacy Engineering: Htriedman.
Apr 6 2021, 9:07 PM

Apr 5 2021

Htriedman updated Htriedman.
Apr 5 2021, 8:36 PM