Page MenuHomePhabricator

Htriedman (Hal Triedman)
Privacy Engineer

Today

  • No visible events.

Tomorrow

  • No visible events.

Wednesday

  • No visible events.

User Details

User Since
Apr 5 2021, 8:13 PM (252 w, 6 d)
Availability
Available
LDAP User
Htriedman
MediaWiki User
Htriedman [ Global Accounts ]

Recent Activity

Wed, Jan 21

Htriedman created T415208: Secret management on airflow for the automated transfer of (public) datasets from stats infra --> WME AWS.
Wed, Jan 21, 4:48 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 2 2025

Htriedman added a comment to T409601: Review and productionize the WME differential privacy data set.

Hi @Snwachukwu! I'll try to answer your questions one-by-one:

Dec 2 2025, 10:31 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review

Nov 18 2025

Htriedman added a comment to T409601: Review and productionize the WME differential privacy data set.

Hi @Ahoelzl! Sorry for the late reply here.

Nov 18 2025, 5:29 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review

Nov 3 2025

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

I would love it to be but have no control over priorities here! What could I do o help move it forward?

Nov 3 2025, 10:54 AM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 7 2025

Htriedman added a comment to T405360: Implement an Airflow operator for moving data from point A to B.

@Ottomata Yes correct! I'm not 100% sure on the details but likely some kind of S3 bucket sync to WME's infra

Oct 7 2025, 2:56 AM · Data-Platform-SRE (2026.01.23 - 2026.02.13), Data-Engineering (Q3 FY25/26 January 1st - March 31th), Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise, Essential-Work

Sep 24 2025

Htriedman closed T405243: Debugging DP country_project_page release, a subtask of T355903: Pageviews {Content Integrity}, as Resolved.
Sep 24 2025, 3:57 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise
Htriedman closed T405243: Debugging DP country_project_page release as Resolved.
Sep 24 2025, 3:57 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Sep 22 2025

Htriedman created T405243: Debugging DP country_project_page release.
Sep 22 2025, 3:38 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Jul 15 2025

Htriedman added a comment to T398501: Requesting a kerberos identity - htriedman.

This seems to have worked! Thank you for the lightning-fast response time :)

Jul 15 2025, 3:08 PM · SRE-Access-Requests, Data-Engineering
Htriedman updated subscribers of T398501: Requesting a kerberos identity - htriedman.
Jul 15 2025, 2:38 PM · SRE-Access-Requests, Data-Engineering
Htriedman added a comment to T398501: Requesting a kerberos identity - htriedman.

Following up on this — any chance I could get someone (e.g. @Clement_Goubert?) to take a quick look?

Jul 15 2025, 2:36 PM · SRE-Access-Requests, Data-Engineering

Jul 2 2025

Htriedman created T398501: Requesting a kerberos identity - htriedman.
Jul 2 2025, 8:06 PM · SRE-Access-Requests, Data-Engineering

Jul 1 2025

Htriedman closed T398075: Requesting access to airflow-an and statboxes for htriedman as Resolved.

@Clement_Goubert just verified that I can get into stat10XX and an-airflow100Y! Changing status to resolved now. Thanks for the help :)

Jul 1 2025, 3:31 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Jun 30 2025

Htriedman added a comment to T398075: Requesting access to airflow-an and statboxes for htriedman.

Hi @Clement_Goubert! When I navigate to the L3 document page, there's no option to sign again — any way I can be purged from the system on your end and/or reset that on my end?

Jun 30 2025, 10:06 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Jun 27 2025

Htriedman created T398075: Requesting access to airflow-an and statboxes for htriedman.
Jun 27 2025, 6:29 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Jun 22 2025

Htriedman added a comment to T396672: Request for dedicated Airflow instance for WME.

Chiming in here to add a bit of info / my personal views on some of the questions and points you've raised above:

Jun 22 2025, 2:31 PM · Essential-Work, Data-Platform-SRE (2025.09.05 - 2025.09.26), Data-Engineering

May 22 2025

Htriedman placed T286508: Investigate storing model metadata on Wikidata up for grabs.
May 22 2025, 2:38 PM · Machine-Learning-Team, ML-Governance, Lift-Wing

May 12 2025

Htriedman updated subscribers of T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge.

Hi all, given that Andy is no longer at the Foundation I think that maybe @sbassett should be pulled in for these perms.

May 12 2025, 2:58 PM · Essential-Work, Data-Platform-SRE (2025.09.05 - 2025.09.26)

Apr 11 2025

Htriedman added a comment to T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge.

Hi! I tried making the seemingly small changes requested here. Specifically, I bumped the version of repos/data-engineering/workflow_utils from v0.14.0 ==> v0.19.0 in .gitlab-ci.yml (see the example I'm basing this off of and the current state of the file at this link) and am trying to push to a new branch. Unfortunately, when I push I get a "You are not allowed to push code to this project" message. Looking at the current project members, I seem to have been removed as a developer at some point.

Apr 11 2025, 3:20 PM · Essential-Work, Data-Platform-SRE (2025.09.05 - 2025.09.26)

Jan 9 2025

Htriedman added a comment to T355903: Pageviews {Content Integrity}.

It's been a minute since I did an update on this — my apologies for that!

Jan 9 2025, 6:08 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Nov 1 2024

Htriedman added a comment to T355903: Pageviews {Content Integrity}.

Weekly update for the week ending on November 1 2024:

Nov 1 2024, 10:10 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise
Htriedman added a comment to T378506: Investigation: Create pageview distribution analysis.

Decided to look at top 10 projects by active user size (as detailed on this page), which yields the following projects:

Nov 1 2024, 9:57 PM · Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Oct 30 2024

Htriedman added a comment to T378506: Investigation: Create pageview distribution analysis.

Initial global (not per-project) analysis:

Oct 30 2024, 4:26 PM · Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Oct 17 2024

Htriedman created T377456: Consider steps to productionize SpinachBot on Wikidata.
Oct 17 2024, 1:40 PM · Wikidata

Oct 8 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@SLyngshede-WMF Hello! Today is my first day back as a contractor. As of right now, I'm unable to log into Okta or officewiki — is there any process for reinstating access to those services? Happy to start a new ticket if needed.

Oct 8 2024, 4:29 PM · Infrastructure-Foundations, Tools

Aug 26 2024

Htriedman updated subscribers of T373332: Published datasets data release request for Wikidata mobile edits metrics.

Hi! I am no longer a member of the privacy team. I would suggest you direct this request to @mpopov or someone on the privacy legal team at WMF.

Aug 26 2024, 6:49 PM · Wikidata Analytics (Kanban), Wikidata, Privacy Engineering

Aug 20 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

As of a few weeks ago I am no longer an employee of WMF, so feel free to change it over whenever @Dzahn!

Aug 20 2024, 3:50 PM · Infrastructure-Foundations, Tools

Aug 13 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Got it — thanks for letting me know!

Aug 13 2024, 2:52 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@KFrancis email sent!

Aug 13 2024, 2:13 AM · Infrastructure-Foundations, Tools

Aug 6 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@Dzahn Got it — yeah, the one I recall signing a few years ago was through the phab UI. I'll wait for Kate to weigh in and send me the email!

Aug 6 2024, 4:51 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@SLyngshede-WMF that sounds like a good plan! Can you link me too the volunteer NDA? I may have already signed it when I was an intern back in 2021, but I'd love to check. As for my personal email, I've updated wikitech and phab to be associated with it (still waiting on gitlab to update), as detailed higher up in this thread. Anything else I need to do there?

Aug 6 2024, 4:39 PM · Infrastructure-Foundations, Tools

Aug 2 2024

Htriedman updated subscribers of T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Just sent @KFrancis an email, and I can tag @CDanis here (picked a random engineer from the infra foundations team page), just to bring this to his attention!

Aug 2 2024, 3:32 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@Dzahn Great questions! I'm planning on rejoining the Foundation as a contractor under WME in early October. I'll be work on data products in and around the analytics infrastructure.

Aug 2 2024, 2:33 AM · Infrastructure-Foundations, Tools

Aug 1 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Yep, done!

Aug 1 2024, 10:41 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

I would change your email address on the wikitech account to your personal email.

Aug 1 2024, 10:35 PM · Infrastructure-Foundations, Tools
Htriedman created T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.
Aug 1 2024, 9:46 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

I'm about to leave WMF, and I wanted to leave a comment here summarizing the design spec for this desired functionality.

Aug 1 2024, 9:10 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jun 4 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

sounds good to me!

Jun 4 2024, 11:17 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 29 2024

Htriedman added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Reading up on this thread now! I think that idea 1 sounds good and shouldn't be privacy-breaking if we report counts/percentages that are above 250 views. Given that we get something like 600m views per day, that lower threshold accounts for 0.000042% of our traffic.

May 29 2024, 7:02 PM · Test Kitchen (Data Products Sprint 17), Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

May 28 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

@AndrewTavis_WMDE this looks good to me! As for T365700, like I said there, it seems like the features you're publishing are pertaining to the underlying dataset, not user/editor/reader activity, so there are no direct privacy concerns.

May 28 2024, 4:36 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 24 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

up to you! the only requirement is to redact/filter out that data

May 24 2024, 11:29 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 23 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Hi @AndrewTavis_WMDE! Taking a look at this in conjunction with the existing Data Publication Guidelines.

May 23 2024, 6:02 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 14 2024

Htriedman added a comment to T362957: Create a dataset for training/evaluating models for summarizing (long) discussions.

I've been actively working on parsing on-wiki discussions in the context of the request a query archive, and I took a few hours to adapt that (hacky but mostly working) code to this task!

May 14 2024, 11:57 PM · Research-Freezer, Wikimedia-Hackathon-2024

May 10 2024

Htriedman added a comment to T362957: Create a dataset for training/evaluating models for summarizing (long) discussions.

I've been doing a lot of wikitext parsing work for the SparQL dataset, including parsing on-wiki conversations. If I can figure that out for (say) the request a query archive, I may take a crack at this adapting the same script to parse RfCs. Will keep everyone updated on this phab task!

May 10 2024, 3:49 PM · Research-Freezer, Wikimedia-Hackathon-2024

Apr 30 2024

Htriedman added a comment to T357613: Measure the reference use and re-use in VE.

Got it! Since this is downstream of an existing event schema, and collects no user identifiers, granular geographic identifiers, or page identifiers, this data collection activity is lower risk. You can go ahead and proceed with building this.

Apr 30 2024, 6:55 PM · WMDE-TechWish-Sprint-2024-06-12, WMDE-TechWish-Sprint-2024-05-29, WMDE-TechWish-Sprint-2024-05-08, WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

Apr 26 2024

Htriedman added a comment to T357613: Measure the reference use and re-use in VE.

Hi @WMDE-Fisch! I'll be conducting this review on phab (at the request of the WMF Legal team until there's a formal agreement between WMF and WMDE). Here's what I originally posted on the L3SC ticket:

Apr 26 2024, 8:25 PM · WMDE-TechWish-Sprint-2024-06-12, WMDE-TechWish-Sprint-2024-05-29, WMDE-TechWish-Sprint-2024-05-08, WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

Apr 25 2024

Htriedman renamed T362805: Build a tool (or tools) to easily visualize differentially-private datasets from Build a tool (or tools) to easily visualize DP datasets to Build a tool (or tools) to easily visualize differentially-private datasets.
Apr 25 2024, 7:23 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024

Apr 17 2024

Htriedman added a project to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page: Wikimedia-Hackathon-2024.

going to investigate the feasibility of this at the WMF Hackathon in a few weeks

Apr 17 2024, 4:46 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a project to T362805: Build a tool (or tools) to easily visualize differentially-private datasets: Toolforge.
Apr 17 2024, 4:44 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman moved T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page from Backlog to Hacking projects on the Wikimedia-Hackathon-2024 board.
Apr 17 2024, 4:44 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman moved T362805: Build a tool (or tools) to easily visualize differentially-private datasets from Backlog to Hacking projects on the Wikimedia-Hackathon-2024 board.
Apr 17 2024, 4:41 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman created T362805: Build a tool (or tools) to easily visualize differentially-private datasets.
Apr 17 2024, 4:39 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman updated subscribers of T344624: Missing contributor stats for Singapore.

I believe that we *do* actually publish the data about Singapore editor numbers — when I query wmf.geoeditors_public_monthly internally, I get ~2000 rows from Singapore, including from March 2024. And Singapore is considered a "Lower risk" country on the Country and Territory Protection List (you can open this tsv which is the official source of truth and ctrl-F "Singapore")...

Apr 17 2024, 3:50 PM · Data-Engineering-Radar, Data-Engineering, Data-Engineering-Wikistats

Mar 28 2024

Htriedman added a comment to T327982: Add cawiki to clickstream dataset.

@VirginiaPoundstone should this be considered under the same auspices as T289532? It may be worthwhile to consider wrapping this up as part of that other task just to limit one-off work that might need to be repeated.

Mar 28 2024, 10:05 PM · Data-Engineering, Data Pipelines, Analytics

Mar 22 2024

Htriedman added a comment to T341139: project-title-country missing US data in recent data, and double quote escaping.

Yes — there's no further work to be done.

Mar 22 2024, 3:49 PM · Test Kitchen, Data-Engineering

Mar 20 2024

Htriedman added a comment to T360073: Wikistats "Active Editors by Country" does not follow definition for active editors.

@Milimetric FWIW about the weekly dataset — folks from product analytics told me that maintaining and publishing the monthly dataset is important for continuity with existing dataset.

Mar 20 2024, 4:31 PM · Movement-Insights, Data-Engineering-Radar, Data Pipelines, Data-Engineering
Htriedman added a comment to T360073: Wikistats "Active Editors by Country" does not follow definition for active editors.

@kzimmerman @Milimetric happy to set up a meeting next week to discuss the differences between the DP and non-DP versions of the geoeditors monthly/weekly datasets

Mar 20 2024, 4:27 PM · Movement-Insights, Data-Engineering-Radar, Data Pipelines, Data-Engineering
Htriedman added a comment to T341139: project-title-country missing US data in recent data, and double quote escaping.

Hi @Ogiermaitre! Thanks for bringing this to our attention, and sorry it's taken so long to respond to you about this — I didn't know this ticket existed until 20 min ago. The US data problem has been fixed.

Mar 20 2024, 4:09 PM · Test Kitchen, Data-Engineering

Mar 18 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

@Ladsgroup is correct about this — this is already happening on an ad hoc basis in some cases where there may be concerns about editor safety for sensitive material.

Mar 18 2024, 6:05 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Feb 27 2024

Htriedman added a project to T358601: Fix CI/CD issues in the differential-privacy repository: Privacy Engineering.
Feb 27 2024, 4:41 PM · Data-Engineering-Icebox, Data-Engineering, Privacy Engineering
Htriedman created T358601: Fix CI/CD issues in the differential-privacy repository.
Feb 27 2024, 4:40 PM · Data-Engineering-Icebox, Data-Engineering, Privacy Engineering

Feb 15 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

thanks for awareness around your capacity, @DannyS712!

Feb 15 2024, 4:04 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 31 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

There's no huge rush — we've deployed a short-term fix, but it requires some manual updating from WMF developers on a regular basis. If you can get to this in a few weeks that would be fine.

Jan 31 2024, 10:03 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

Hi @DannyS712! Have you made any progress on this?

Jan 31 2024, 8:17 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 24 2024

Htriedman added a comment to T355696: Update canonical_data.countries to reflect new country protection policy.

This task is high priority — the new country and territory protection list policy is out, and we'd like the internal list to reflect that as soon as possible.

Jan 24 2024, 10:08 PM · Movement-Insights

Jan 23 2024

Htriedman created T355696: Update canonical_data.countries to reflect new country protection policy.
Jan 23 2024, 5:07 PM · Movement-Insights

Jan 9 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

I'm assuming that this is all meant to be oversight-level suppression, rather than admin-level (unless we want both as options).

Correct!

Jan 9 2024, 5:10 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 8 2024

Htriedman created T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.
Jan 8 2024, 11:32 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a comment to T353306: [request] consultation for a whitepaper.

@leila Hi! Would love to hear more about what you're thinking with regard to privacy :) Feel free to schedule something with me or continue the conversation here!

Jan 8 2024, 6:17 PM · SecTeam-Processed, Privacy Engineering, Research

Nov 13 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Any updates on this?

Nov 13 2023, 8:11 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 31 2023

Htriedman closed T207171: Have a way to show the most popular pages per country as Resolved.
Oct 31 2023, 11:24 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews
Htriedman closed T299627: Investigate releasing historical top-pageview-per-country data as Resolved.

Update (very late but still necessary): As of Feb 2023, this data request has been completed!

Oct 31 2023, 11:11 PM · Privacy Engineering, Data-Engineering

Oct 19 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Hi @VirginiaPoundstone! Thanks for the detailed questions! I'll try to answer them one by one :)

Oct 19 2023, 8:43 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 13 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

@JFishback_WMF I'll invite you to a meeting about this next week!

Oct 13 2023, 9:06 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 12 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

@JAllemandou Thanks for the kind words! For the moment, yes — let's try to standardize use of the country protection list and try to avoid keeping multiple versions of the list hard-coded in jobs. I will work on the following:

  1. getting my proposed schema reviewed by legal and human rights
  2. implementing the new schema in hive
  3. updating documentation on wikitech
  4. getting this data release onto a DP framework (cc: @Isaac)
Oct 12 2023, 6:30 PM · Data Engineering and Event Platform Team (Sprint 3)
Htriedman updated subscribers of T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

subscribing @Cleo_Lemoisson for visibility

Oct 12 2023, 4:41 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 11 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

Hi! Thanks for flagging this, @Isaac! Definitely agree that this dataset is a great candidate for differential privacy (DP), which would also likely reduce the minimum publication threshold to <500. I'm happy to start working on that with you — it's a somewhat independent process from the discussion of the country protection list (CPL) and I think this dataset could benefit from it.

Oct 11 2023, 6:29 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 4 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

@Eevans In that case, I'll change the data model to drop it! Will update this thread when it's done.

Oct 4 2023, 10:06 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE
Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

@Eevans Understood! I'll make that change to the schema soon.

Oct 4 2023, 7:52 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 3 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Hi all! I've made updates to the codebase to better comply with @Eevans' feedback, resulting in a greatly simplified interface. I've listed the following design changes below:

Oct 3 2023, 11:50 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Sep 21 2023

Htriedman updated subscribers of T347104: Application Security Review Request : Fundraise Up scripts for Donatewiki.

@sbassett tagging you in this for visibility

Sep 21 2023, 9:19 PM · secscrum, Security, Application Security Reviews

Sep 20 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Some of them are just artifacts of starting from a fork of one of the legacy services. For example, we'll want to adopt a new (better) convention for keyspace and table naming; Names like "local_group_default_T_dp_pageviews".datawere generated by the RESTBase codebase. Likewise, the "_domain" attribute (which is always set to analytics.wikimedia.org for these services) was done to appease RESTBase, and isn't something we should be perpetuating. Easy changes, mostly cosmetic.

Sep 20 2023, 8:10 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Sep 19 2023

Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

I like this idea! Makes a lot of sense and covers more edge cases than my simpler solution was proposing. Feel free to implement this, and if you do, please write it up in a separate document and share it with me — could be very useful in future cases where we're considering releasing similar sensitive data with a relatively small number of raw data entries in the underlying dataset.

Sep 19 2023, 6:16 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Connection-Team (Connection-Current-Sprint), Campaign-Registration, CampaignEvents

Sep 15 2023

Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

As for reporting percentages, you can take an example from the new data publication guidelines. We considered how to report percentages in the "Threshold table" section of the policy: https://foundation.wikimedia.org/wiki/Legal:Data_publication_guidelines#Threshold_table

Sep 15 2023, 4:32 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Connection-Team (Connection-Current-Sprint), Campaign-Registration, CampaignEvents
Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

Hi @ifried! Thanks for bringing this up — I wrote the initial set of recommendations for obfuscating event data in these contexts, and know that there are many contexts in which showing "<5" to an event organizer will leak the exact number of responses in that category. It is, at best, a partial fix that will be effective at deterring non-malicious people who have access to reports.

Sep 15 2023, 4:29 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Connection-Team (Connection-Current-Sprint), Campaign-Registration, CampaignEvents

Aug 22 2023

Htriedman updated subscribers of T343855: AQS 2.0 differentially private pageviews deploy API.

Hi all! It's been a few weeks without activity, so I'm following up on this request.

Aug 22 2023, 7:14 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Aug 21 2023

Htriedman added a comment to T340942: Check home/HDFS leftovers of tmtl.io contractors.

Hi @BTullis! All of these Tumult Labs folks were working in more of an advisory role — even if their directories contain some uncommitted changes, you can delete them and remove their user profiles.

Aug 21 2023, 4:41 PM · Data-Engineering
Htriedman added a comment to T344617: Multiple DAGs on platform_eng instance failing on Spark Skein operators with ConnectionError.

Thanks for taking care of this @xcollazo and @BTullis! really appreciate you catching this while I was OOO

Aug 21 2023, 4:38 PM · Data-Platform-SRE

Aug 8 2023

Htriedman created T343855: AQS 2.0 differentially private pageviews deploy API.
Aug 8 2023, 8:13 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Aug 3 2023

Htriedman updated the task description for T343304: MakeItSPARQL! - build a UI for the LLM that translates natural language into SPARQL queries for Wikidata.
Aug 3 2023, 5:51 PM · Wikimania-Hackathon-2023, Wikidata Query UI, patch-welcome, Wikidata
Htriedman updated the task description for T343304: MakeItSPARQL! - build a UI for the LLM that translates natural language into SPARQL queries for Wikidata.
Aug 3 2023, 5:39 PM · Wikimania-Hackathon-2023, Wikidata Query UI, patch-welcome, Wikidata

Aug 1 2023

Htriedman added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Hi @odimitrijevic! Here's the gitlab repo I worked on during the documentathon :) https://gitlab.wikimedia.org/htriedman/documentathon-eventstream

Aug 1 2023, 5:48 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Jul 25 2023

Htriedman added a comment to T342487: [Event Platform] Actor performing suppression revealed publicly.

^^agree with the above analysis — if we can selectively remove the performer of suppressions, then this should be considered resolved.

Jul 25 2023, 5:16 PM · Data-Engineering (Sprint 6), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), SecTeam-Processed, Privacy Engineering, Event-Platform, Vuln-Infoleak, Security

Jul 24 2023

Htriedman added a comment to T340149: Review and provide feedback to Guidelines for Data Publication.

Thanks for your comments, @fkaelin! I'll get back to you about the topN pages once we meet about it.

Jul 24 2023, 4:58 PM · Research

Jul 21 2023

Htriedman added a comment to T207171: Have a way to show the most popular pages per country.

@Flomeier85 if you have any questions at all feel free to post them here or reach out to me via email at htriedman@wikimedia.org :)

Jul 21 2023, 10:07 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews

Jul 20 2023

Htriedman added a comment to T340149: Review and provide feedback to Guidelines for Data Publication.

I know @Isaac has given feedback on this doc — @fkaelin any additional comments?

Jul 20 2023, 7:00 PM · Research

Jul 17 2023

Htriedman added a comment to T315676: Add DP cookie for pageview filtering.

@Vgutierrez this feature has been working as expected, and this ticket can be closed!

Jul 17 2023, 2:50 PM · SRE, Traffic

Jul 14 2023

Htriedman added a comment to T341907: Release datasets in support of Wikimedia-related AI modeling.

Isaac and I spent some time brainstorming about this last month. Here's a google doc with a bunch of existing ideas in it!

Jul 14 2023, 9:07 PM · Research, Epic

Jul 11 2023

Htriedman added a comment to T334851: Define a procedure/pattern to populate test environments.

I would be strongly in favor of using mock data over synthetic data, at least for the moment. We should only have an explicit preference for synthetic data if there's a real need for the underlying statistical distribution of the fake data to mirror that of the real data. If it's just for performance testing, that shouldn't be necessary.

Jul 11 2023, 4:30 PM · Catalyst (Prototype leftovers 🍱), SecTeam-Processed, Privacy Engineering, serviceops-radar, WMF-Architecture-Team, Platform Engineering, Release-Engineering-Team, API Platform, AQS2.0

Jul 5 2023

Htriedman added a comment to T335892: Get stats on Gadgets and Users scripts loading third-party resources.

Definitely would be pro-overriding the user-agent for fontcdn (and cdnjs) — that would make it significantly easier to argue that they should be considered ok to allowlist for third-party resources.

Jul 5 2023, 6:37 PM · WMF-General-or-Unknown, affects-Miraheze, SecTeam-Processed, Privacy Engineering, tech-decision-forum