Page MenuHomePhabricator

Htriedman (Hal Triedman)
Privacy Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Apr 5 2021, 8:13 PM (103 w, 3 h)
Availability
Available
LDAP User
Htriedman
MediaWiki User
HTriedman (WMF) [ Global Accounts ]

Recent Activity

Today

Htriedman added a comment to T331647: Grant Hal deployment rights.

@MoritzMuehlenhoff Sorry if this is a silly question, but I've been trying to run commands as analytics-platform-eng on stat machines by using sudo -u analytics-platform-eng <cmd>... and am being prompted for my user password — I don't recall ever having used a password to access my stat machines, and it's not any password I can remember. Do you know where I might be able to go for those credentials?

Mon, Mar 27, 11:24 PM · SRE, SRE-Access-Requests

Tue, Mar 21

Htriedman updated subscribers of T331647: Grant Hal deployment rights.

@Jcross asking for approval from you — I need these rights in order to deploy DP scripts that will run on a schedule on airflow

Tue, Mar 21, 3:27 PM · SRE, SRE-Access-Requests

Mon, Mar 20

Htriedman added a comment to T331067: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for AranyaP.

Hi @MatthewVernon! We're currently running into some weird errors with Aranya's permissions, specifically regarding access to Turnilo and Superset. Is there any way of addressing that on this thread? Or should we start a new ticket? Thanks so much.

Mon, Mar 20, 8:15 PM · SRE, SRE-Access-Requests

Thu, Mar 16

Htriedman added a comment to T331647: Grant Hal deployment rights.

just bumping this!

Thu, Mar 16, 7:35 PM · SRE, SRE-Access-Requests

Wed, Mar 8

Htriedman added a comment to T331416: The nsfw model hangs in predict() after moving to Kserve 0.10.

@elukey not exactly sure what's going on here, but I can check into it and get back to you!

Wed, Mar 8, 6:17 PM · Machine-Learning-Team

Thu, Mar 2

Htriedman added a comment to T330234: Differential privacy airflow-dags merge request.

@JArguello-WMF nope! I chatted with @Milimetric a couple of days ago and he said that we're good to go (as an initial MVP release, at least). Waiting on him to feel better to give the final approval and merge. I'll follow up on a new ticket if there's anything else I need besides that.

Thu, Mar 2, 7:29 PM · Data Pipelines (sprint 10), Data-Engineering

Tue, Feb 28

Htriedman created T330793: Address dataset frontrunning attack in dumps.
Tue, Feb 28, 8:46 PM · Dumps-Generation

Feb 21 2023

Htriedman created T330234: Differential privacy airflow-dags merge request.
Feb 21 2023, 9:17 PM · Data Pipelines (sprint 10), Data-Engineering

Feb 9 2023

Htriedman added a comment to T329209: add Hal Triedman (htriedman) to ops-l mailing list.

@fgiunchedi I just signed up via lists.wikimedia.org! Thanks for getting back to me.

Feb 9 2023, 5:12 PM · SRE

Feb 8 2023

Htriedman created T329209: add Hal Triedman (htriedman) to ops-l mailing list.
Feb 8 2023, 5:46 PM · SRE

Feb 6 2023

Htriedman added a comment to T315676: Add DP cookie for pageview filtering.

@Vgutierrez thanks so much! taking a look now

Feb 6 2023, 4:01 PM · SRE, Traffic

Jan 31 2023

Htriedman added a comment to T328152: Some users' presto queries are no longer working in Superset.

Up and running! thanks for the help

Jan 31 2023, 5:15 PM · Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)), Data-Engineering
Htriedman added a comment to T328152: Some users' presto queries are no longer working in Superset.

My SQL Lab on superset has also not been working for the past week or so!

Jan 31 2023, 5:05 PM · Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)), Data-Engineering

Nov 28 2022

Htriedman added a comment to T322591: Requesting access to analytics-privatedata-users for Dasm.

@andrea.denisse that is correct! 2023-06-30 is the expiry date for @dasm

Nov 28 2022, 4:00 PM · SRE, SRE-Access-Requests

Nov 15 2022

Htriedman added a comment to T287527: Outlinks model card .

This model card has already been created!

Nov 15 2022, 4:42 PM · Machine-Learning-Team, ML-Governance, Lift-Wing

Nov 9 2022

Htriedman updated subscribers of T322591: Requesting access to analytics-privatedata-users for Dasm.

@Dzahn @KFrancis Yes, that is correct, @dasm is a Tumult Labs contractor working with us on differential privacy.

Nov 9 2022, 4:48 PM · SRE, SRE-Access-Requests
Htriedman updated subscribers of T322670: Requesting access to analytics-privatedata-users for David.pujol.

@fgiunchedi the expiry dates from other @tmlt.io folks are correct!

Nov 9 2022, 4:44 PM · SRE, SRE-Access-Requests

Nov 3 2022

Htriedman added a comment to T318863: Event Platform and DataHub Integration.

@Ottomata I think the above is the right approach (if we decide to do it)

Nov 3 2022, 9:38 PM · Data-Catalog, Data-Engineering-Planning, Event-Platform Value Stream

Oct 12 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Hi all! Just wanted to come back to this thread (even though it's been more than a month or two) with some updates — 

Oct 12 2022, 7:38 PM · Privacy Engineering, Data-Engineering
Htriedman added a comment to T320214: [REQUEST] A list of the most edited English Wikipedia pages in Australia.

Just wanted to jump in here — I'm an engineer on the Privacy Engineering team, and I've been working on releasing this data (pageviews aggregated by country and project) safely for about a year now!

Oct 12 2022, 7:35 PM · Wikimedia Australia, Product-Analytics

Sep 29 2022

Htriedman added a comment to T316815: Helper function SparkSubmitOperator.for_virtualenv() fails on Spark3.

The specific error we're running into now is a Hive configuration issue (I believe), emerging when we use a combination of Skein + Spark3 + a custom virtualenv

Sep 29 2022, 7:37 PM · Data Pipelines

Aug 29 2022

Htriedman created T316600: Broken DAG Error when trying to import Gitlab .tgz file into airflow.
Aug 29 2022, 9:03 PM · Data-Engineering-Planning, Data Pipelines

Aug 12 2022

Htriedman added a comment to T314810: Deploy NSFW model to production.

woohoo!! it works! so excited to see this up in production; I just tried it myself, and all seems to be working correctly!

Aug 12 2022, 4:56 PM · Machine-Learning-Team (Active Tasks), Lift-Wing

Aug 9 2022

Htriedman added a comment to T314810: Deploy NSFW model to production.

@Aklapper it's the output of a project during Innovation Week to 1) retrain an image model for classifying nude, pornographic, gory, etc. imagery and 2) deploy it on the new ML infrastructure (Liftwing) to figure out pain points/issues with non-MLE people and the broader community deploying models.

Aug 9 2022, 1:15 AM · Machine-Learning-Team (Active Tasks), Lift-Wing

Jul 25 2022

Htriedman added a comment to T313620: Request to publish a dataset of aggregated ContentTranslation language pair activity.

Hi @awight! I'm the privacy engineer in charge of reviewing data releases at WMF, and I'll try my best to take a look at this request over the course of the next week.

Jul 25 2022, 10:13 PM · ContentTranslation, Privacy Engineering

Jul 21 2022

Htriedman updated subscribers of T313526: Deploy NSFW model using experimental local docker kserve container.

Current iteration of the model is on Github here: https://github.com/htried/Image-Content-Filtration/tree/statbox-retrain-test.

Jul 21 2022, 5:02 PM · Machine-Learning-Team (Active Tasks)

Jun 22 2022

Htriedman added a comment to T310341: Add some columns of `renameuser_queue` to the replica.

Hi all!

Jun 22 2022, 6:38 PM · cloud-services-team, Data-Services, Stewards-and-global-tools

May 11 2022

Htriedman updated subscribers of T307245: Swift for differential privacy data publication.

@Milimetric Thanks for the pointers on this process! I also just talked to @gmodena and think that we're starting to come to a good set of solutions for how we might put together the disparate pieces of this project. I'll be sure to keep you all in the loop.

May 11 2022, 7:04 PM · SRE-swift-storage, Privacy Engineering, Data-Engineering

May 2 2022

Htriedman added a comment to T295072: Install spark3 in analytics clusters.

@Ottomata Thanks for this update! The differential privacy project is currently using a jerry-rigged version of Spark 3 to run our software packages, so please let me know (either in this thread on phab or via slack) when you've been able to install Spark 3 on anaconda-wmf.

May 2 2022, 6:39 PM · Patch-For-Review, Epic, Data Pipelines

Apr 29 2022

Htriedman created T307245: Swift for differential privacy data publication.
Apr 29 2022, 5:51 PM · SRE-swift-storage, Privacy Engineering, Data-Engineering

Apr 20 2022

Htriedman created T306551: Request to delete tumult_temp_<date>_<random ID> tables/schemas from analytics cluster.
Apr 20 2022, 4:18 PM · Data-Engineering

Apr 4 2022

Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore! Hope you're well — I'm done with my privacy review and am hoping to share it with you soon (I just need your email)

Apr 4 2022, 9:50 PM · Data-Engineering-Radar, Privacy Engineering, Privacy
Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore working on this now, hopefully I'll have it done in the next 24h!

Apr 4 2022, 3:57 PM · Data-Engineering-Radar, Privacy Engineering, Privacy

Mar 29 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Great! I'll be sure to circle back to this in a month or two with some updates.

Mar 29 2022, 5:28 PM · Privacy Engineering, Data-Engineering

Mar 28 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Thanks so much for getting back to me on this with some more information. We're currently in the middle of establishing protocols and processes around the use of differential privacy (and configuring software!), which should be done by the end of Q4 (June 2022). This data release is definitely possible within certain privacy bounds — if we can wait until then. If not, I can also potentially suggest some other mitigation heuristics.

Mar 28 2022, 3:53 PM · Privacy Engineering, Data-Engineering

Mar 25 2022

Htriedman added a comment to T299627: Investigate releasing historical top-pageview-per-country data.

Hi @JAllemandou — does this pageview data exist in a private table somewhere stripped of the actor_signature field? Or is it preaggregated somehow? This could be a case where differential privacy (which we are currently piloting on similar data) could come in handy.

Mar 25 2022, 3:33 PM · Privacy Engineering, Data-Engineering

Mar 22 2022

Htriedman added a comment to T301391: Update click tracking to take into account screen resolution.

The privacy team has conducted a review of the proposed data collection scheme — without mitigations, it would be deemed medium risk. However, after privacy-protecting mitigations like automated data deletion after 90 days, bucketing, and restricting access to WMF/NDA'd people, collecting this data was deemed low risk.

Mar 22 2022, 7:52 PM · MW-1.39-notes (1.39.0-wmf.7; 2022-04-11), Patch-For-Review, Desktop Improvements (Vector 2022), Readers-Web-Backlog (Kanbanana-FY-2021-22), MediaWiki-extensions-WikimediaEvents

Mar 15 2022

Htriedman added a comment to T303304: Privacy review for dataset publishing (Wikidata topic -> pageview data).

Hi @Addshore! I'm Hal, a privacy engineer at WMF, and I'll be taking a look at this (rerunning the notebook, assessing potential harms, writing up a formal privacy review, etc.) in the next few days.

Mar 15 2022, 8:38 PM · Data-Engineering-Radar, Privacy Engineering, Privacy

Feb 28 2022

Htriedman closed T301680: Requesting access to analytics-privatedata for Kiron Lebeck (klebeck-tmlt) as Declined.

Kiron is leaving Tumult Labs soon, so this task doesn't need to be completed.

Feb 28 2022, 7:58 PM · SRE, SRE-Access-Requests

Feb 15 2022

Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

Update: they should all be in the NDA and MOU document now

Feb 15 2022, 6:51 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

@Dzahn checking in with Legal now

Feb 15 2022, 4:48 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301782: Requesting access to analytics-privatedata-users for Michael.hay.

@RhinosF1 Correct assumptions on both counts — @JBennett is the approving manager, and the expiry date is 13 September 2022

Feb 15 2022, 4:48 PM · SRE, SRE-Access-Requests

Feb 14 2022

Htriedman updated subscribers of T301679: Requesting access to analytics-privatedata for Tom Magerlein.

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman updated subscribers of T301680: Requesting access to analytics-privatedata for Kiron Lebeck (klebeck-tmlt).

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman updated subscribers of T301659: Requesting access to analytics-privatedata-users for Damiendf.

Hi SRE, just want to drop in here that this task should have the same comments as T301581:

  • @JBennett is approving manager
  • The confidentiality and access responsibilities of employees from Tumult Labs are covered by the MSA that WMF and Tumult both signed
  • Contract expiry date is 13 September 2022
Feb 14 2022, 5:28 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

@RhinosF1 just checked, the expiry date is 13 September 2022

Feb 14 2022, 5:27 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T301581: Requesting access to analytics-privatedata for Skye Berghel.

I believe the expiry should be roughly 6 months from now — let's say (for the moment, at least) 31 August 2022.

Feb 14 2022, 4:54 PM · SRE, SRE-Access-Requests

Feb 12 2022

Htriedman updated subscribers of T301581: Requesting access to analytics-privatedata for Skye Berghel.

Hi SRE team! Just a couple of clarifications here — the approving party is actually @JBennett, rather than myself.

Feb 12 2022, 5:52 PM · SRE, SRE-Access-Requests

Jan 26 2022

Htriedman updated subscribers of T300173: Add Htriedman (Hal Triedman) to #wmf-nda.

Also subbing my direct supervisor @Jcross

Jan 26 2022, 5:57 PM · WMF-NDA-Requests
Htriedman added a comment to T300173: Add Htriedman (Hal Triedman) to #wmf-nda.

LDAP and SUL accounts are both linked to this account, let me know if anything else needs to be done on this front!

Jan 26 2022, 5:49 PM · WMF-NDA-Requests

Jan 19 2022

Htriedman added a comment to T287527: Outlinks model card .

@calbon sounds good — I'm also in the middle of putting the information spread across all the outlinks-model-related pages into my model card content v0.2 doc. That document is on google docs, but should be publicly accessible at this link. Looking forward to talking about it next week!

Jan 19 2022, 6:37 PM · Machine-Learning-Team, ML-Governance, Lift-Wing

Dec 14 2021

Htriedman updated Htriedman.
Dec 14 2021, 6:07 PM

Nov 3 2021

Htriedman added a comment to T294970: Requesting access to restricted for htriedman.

Totally understand. Thanks for the tips!

Nov 3 2021, 9:56 PM · SRE, SRE-Access-Requests
Htriedman added a comment to T294970: Requesting access to restricted for htriedman.

Hi @Urbanecm, thanks for the quick response and the helpful pointer. I've been able to get into centralauth by running analytics-mysql centralauth, and can query centralauth.globaluser. I must've been mistaken in thinking that I need access to mwmaint — that came up as part of a discussion with one of my peers who had access to mwmaint, and I didn't realize the same data was accessible with my current user permissions. You can deny this request and close this ticket.

Nov 3 2021, 9:52 PM · SRE, SRE-Access-Requests
Htriedman created T294970: Requesting access to restricted for htriedman.
Nov 3 2021, 7:29 PM · SRE, SRE-Access-Requests
Htriedman removed a project from T245110: Should per user `Echo-whitelist` pages be protected?: Security-Team.
Nov 3 2021, 7:01 PM · Growth-Team-Filtering, Growth-Team, MediaWiki-User-management, Notifications, Security, User-DannyS712

Oct 28 2021

Htriedman added a comment to T294391: Expose mediawiki/revision/tags-change in stream.wikimedia.org.

Got it. In that case I don't see it adding any new privacy risk — I'll just make sure to bump my investigation of the frequency and severity of these incidents up on my todo list.

Oct 28 2021, 9:36 PM · Data-Engineering-Kanban, Event-Platform Value Stream, Data-Engineering, EventStreams
Htriedman added a comment to T294391: Expose mediawiki/revision/tags-change in stream.wikimedia.org.

I know that this is the same theoretical attack vector as revision create, e.g. someone creates a page with a title like "Hal Triedman's SSN is XX-XXX-XXXX" that is quickly removed and suppressed, but the revision create event publicly consumable in the event stream for 7 days.

Oct 28 2021, 6:46 PM · Data-Engineering-Kanban, Event-Platform Value Stream, Data-Engineering, EventStreams
Htriedman updated subscribers of T280385: Apache Beam go prototype code for DP evaluation.

Hi @Milimetric! Thanks for commenting on this task — lots has happened in the last 3-4 weeks and this served as a good reminder to update this thread as to where we currently are on this project.

Oct 28 2021, 3:52 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Oct 22 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

I have some more updates after working on the algo-accountability repo for another week:

Oct 22 2021, 11:12 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Oct 13 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Hi all! Just wanted to post a quick update on some of these ORES transparency efforts — I have (mostly) compiled a repository of datasets (~10GB), model binaries (~0.5 GB), model architectures, model training performance, etc. that are used in ORES. You can check it out at the ores-data repo on Gitlab. There were some holes where datasets/models didn't compile or were otherwise corrupted somehow, but I did my best to document what didn't work for whatever reason.

Oct 13 2021, 11:14 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team
Htriedman removed a project from T207777: audit password policy check for constant time string comparisons: Security-Team.
Oct 13 2021, 7:34 PM · Security, MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), Google-Code-in-2018, MediaWiki-User-management
Htriedman removed a project from T210790: Allow cross-origin requests by default in the Action API: Security-Team.
Oct 13 2021, 7:33 PM · Platform Engineering, Patch-For-Review, MediaWiki-Action-API
Htriedman removed a project from T214489: Improve LDAP logging: Security-Team.
Oct 13 2021, 7:33 PM · LDAP, SRE
Htriedman removed a project from T221576: Password length requirement error shown twice: Security-Team.
Oct 13 2021, 7:32 PM · MediaWiki-User-login-and-signup
Htriedman removed a project from T239077: Define policy aspects of CSP on wiki: Security-Team.
Oct 13 2021, 7:31 PM · Privacy Engineering, Documentation, Privacy, ContentSecurityPolicy
Htriedman removed a project from T248294: Separate permission for creating a page with a custom content model: Security-Team.
Oct 13 2021, 7:26 PM · Editing-team, MediaWiki-User-management, User-DannyS712
Htriedman removed a project from T255370: Document best practices for user login if user is using 2FA: Security-Team.
Oct 13 2021, 7:24 PM · MediaWiki-extensions-OATHAuth, Platform Team Initiatives (API Gateway Roadmap), MediaWiki-Documentation, Documentation, MediaWiki-Authentication-and-authorization, Security

Sep 30 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Just checked the datasets that are going to be made available for public release. Everything is in order and you're all set to share them outside of the NDA group. With the mitigations taken, the residual risk level that this data poses to editors is low.

Sep 30 2021, 5:02 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi everyone — I know it's been several months since this ticket has been updated, but work on implementing DP at scale in production has continued over the last several months, and I wanted to post publicly with some updates on our process:

Sep 30 2021, 2:45 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Sep 28 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Just took a look. Those files are alright to share with people who have signed NDA with the Foundation. They are not ok to share publicly, since they contain exact counts of editors and edits, rather than aggregated buckets of counts (11-20 editors instead of 14, 100-200 edits instead of 151, etc.).

Sep 28 2021, 8:14 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @GoranSMilovanovic — apologies for the confusion. I understand that you are intending to remove all informations from countries on the Country Protection List, and was trying to respond to @Manuel's follow-up question, just for the sake of learning:

Sep 28 2021, 7:50 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics
Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @Manuel — so sorry for the late response; my phabricator account was misconfigured and I didn't get a notification email. Thanks so much for getting back to me with all this information.

Sep 28 2021, 6:40 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics

Sep 27 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Hi again @Isaac! Just wanted to re-respond to your post about protected classes with a strong measure of agreement. The only feature I might add to the content analysis is geography among anonymous IP editors. At the same time, I think a lot of the features we want to measure depend in large part upon the size of the test set to prevent bad stats and data leaks (it really doesn't have to be very large; 200-400 evenly-distributed samples would likely do just fine).

Sep 27 2021, 1:14 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 24 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

End of the week update: I have officially been able to run a full model card through the pipeline! (yay) Example here (sorry for the sparsity of data, it's only running on ~50 revisions to keep testing relatively quick).

Sep 24 2021, 10:16 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 21 2021

Htriedman added a comment to T291186: Privacy Policy Review for Global South Wikidata edits and active editors datasets.

Hi @GoranSMilovanovic and @Manuel! My name is Hal — I'm a privacy engineer on the Privacy Engineering team. There is some precedent for releasing data of this variety, but I still have a couple of questions:

Sep 21 2021, 3:24 PM · Privacy Engineering, Analytics-Radar, Wikidata, WMDE-Analytics-Engineering, Wikidata Analytics

Sep 16 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

@calbon you're making a good point and this is definitely a conversation worth having.

Sep 16 2021, 9:20 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Sep 13 2021

Htriedman moved T280385: Apache Beam go prototype code for DP evaluation from Backlog to Completed on the Privacy Engineering board.
Sep 13 2021, 6:54 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Sep 10 2021

Htriedman added a comment to T290746: Developing the `algo-accountability` repository.

Quick update — I spent some time visually charting out exactly what the infrastructure/workflow for this system might look like. I've attached an image of my proposed design, and you can make comments on the design on Google Drawings here.

Sep 10 2021, 6:30 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team
Htriedman created T290746: Developing the `algo-accountability` repository.
Sep 10 2021, 4:18 PM · ML-Governance, Documentation, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team

Jul 13 2021

Htriedman added a comment to T276398: Experiment with on-wiki model documentation.

@calbon good first question — yes, the model card bot should overwrite manual edits. In my mind, the canonical way to edit model cards should be by pushing some change to a config file for the card hosted on Gerrit/Gitlab. Then, changes will show up on the next run of the card generator.

Jul 13 2021, 8:38 PM · Machine-Learning-Team (Active Tasks), Documentation, artificial-intelligence
Htriedman added a comment to T276398: Experiment with on-wiki model documentation.

Hi all! I just put together a memo synthesizing what dataset/model/service documentation could look like on WMF resources. For generalizability, I've called the documentation "algorithmic accountability sheets". The memo addresses what questions we should be asking for each component of an algorithm, as well as some thoughts on governance, metrics, deprecation, etc.

Jul 13 2021, 6:17 PM · Machine-Learning-Team (Active Tasks), Documentation, artificial-intelligence

Jun 24 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

@gmodena It might also be worth considering implementing HDFS in the Beam Go SDK ourselves — the template for doing it (which you can see here, for example) is relatively simple.

Jun 24 2021, 7:55 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Jun 4 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Part of this task is to make data releases of this type part of the cycle of data releases at WMF so I do not think we should pursue the option of treating this project like a one off data release, rather we should think of it running it as any other data flow as a core requirement.

Jun 4 2021, 9:35 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release
Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Thanks @Isaac and @Nuria for the in-depth discussion of the relative pros and cons of these two approaches, and for the deep dive on user-side filtering. I wanted to chime in with some more context that I recently learned about putting DP into production, regardless of how we filter/limit pageviews. These considerations may be relevant as we move toward creating this as a service.

Jun 4 2021, 5:11 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Jun 3 2021

Htriedman closed T283368: Requesting access to production analytics data and cluster for htriedman as Resolved.

It's working perfectly! Thanks so much for the responsiveness.

Jun 3 2021, 6:08 PM · Analytics, SRE, SRE-Access-Requests
Htriedman reopened T283368: Requesting access to production analytics data and cluster for htriedman as "Open".

Hi all — reopening this task so that I can get access to https://superset.wikimedia.org as a Hive GUI.

Jun 3 2021, 5:59 PM · Analytics, SRE, SRE-Access-Requests

Jun 2 2021

Htriedman added a comment to T283368: Requesting access to production analytics data and cluster for htriedman.

@JBennett tagging you to flag that you need to sign off on this

Jun 2 2021, 3:47 PM · Analytics, SRE, SRE-Access-Requests

May 21 2021

Htriedman created T283368: Requesting access to production analytics data and cluster for htriedman.
May 21 2021, 4:17 PM · Analytics, SRE, SRE-Access-Requests

May 3 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi all — just finished updating the demo to get it into a good place. You can see the finished product (UI, user- and pageview-level privacy, etc.) at https://diff-privacy-beam.wmcloud.org. Please let me know what you think, and if there are any next steps that any of you can see toward getting this into a production prototype. Thanks for all the help so far :)

May 3 2021, 9:49 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 30 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

@TedTed, thanks for explaining thresholding and why δ is necessary, even with Laplace noise. Really useful to know what's happening under the hood of Privacy on Beam.

Apr 30 2021, 6:37 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 21 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.
  • You mention processing 500,000 rows in the README. Am I correct in assuming this is the process: 1) gather top-50 viewed articles from API for that language, 2) de-aggregate the data and load into database so that e.g., an article with 50,000 pageviews becomes 50,000 separate rows, 3) extract the data and run through the diff-privacy framework (any filtering + addition of noise), 4) return privacy-aware counts.
Apr 21 2021, 5:27 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 20 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Just wanted to give you a quick status update — I have a somewhat functional re-implementation of @Isaac's tool using Golang/Beam up and running locally. I'm still working on getting it working/hosted in Toolforge (which doesn't as a service play very nicely with Go quite yet), but I'm hoping that should be done this week.

Apr 20 2021, 10:42 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 16 2021

Htriedman added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Hi all — I'm Hal Triedman, the new Privacy Engineering intern. Over the last few days, I've been working on re-implementing the tool that @Isaac made (https://diff-privacy.toolforge.org) using Go's Apache Beam SDK and Google's Privacy on Beam package, rather than Python, Flask, and hand-coded DP functions.

Apr 16 2021, 6:03 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release

Apr 7 2021

Htriedman added a member for Privacy: Htriedman.
Apr 7 2021, 3:10 PM
Htriedman added a member for Security: Htriedman.
Apr 7 2021, 3:01 PM

Apr 6 2021

Htriedman added a member for Security-Team: Htriedman.
Apr 6 2021, 9:11 PM
Htriedman added a member for Privacy Engineering: Htriedman.
Apr 6 2021, 9:07 PM

Apr 5 2021

Htriedman updated Htriedman.
Apr 5 2021, 8:36 PM