Page MenuHomePhabricator

Htriedman (Hal Triedman)
Privacy Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Apr 5 2021, 8:13 PM (201 w, 1 d)
Availability
Available
LDAP User
Htriedman
MediaWiki User
Htriedman [ Global Accounts ]

Recent Activity

Jan 9 2025

Htriedman added a comment to T355903: Pageviews {Content Integrity}.

It's been a minute since I did an update on this — my apologies for that!

Jan 9 2025, 6:08 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Nov 1 2024

Htriedman added a comment to T355903: Pageviews {Content Integrity}.

Weekly update for the week ending on November 1 2024:

Nov 1 2024, 10:10 PM · Epic, Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise
Htriedman added a comment to T378506: Investigation: Create pageview distribution analysis.

Decided to look at top 10 projects by active user size (as detailed on this page), which yields the following projects:

Nov 1 2024, 9:57 PM · Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Oct 30 2024

Htriedman added a comment to T378506: Investigation: Create pageview distribution analysis.

Initial global (not per-project) analysis:

Oct 30 2024, 4:26 PM · Wikimedia Enterprise - Content Integrity, Wikimedia Enterprise

Oct 17 2024

Htriedman created T377456: Consider steps to productionize SpinachBot on Wikidata.
Oct 17 2024, 1:40 PM · Wikidata

Oct 8 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@SLyngshede-WMF Hello! Today is my first day back as a contractor. As of right now, I'm unable to log into Okta or officewiki — is there any process for reinstating access to those services? Happy to start a new ticket if needed.

Oct 8 2024, 4:29 PM · Infrastructure-Foundations, Tools

Aug 26 2024

Htriedman updated subscribers of T373332: Published datasets data release request for Wikidata mobile edits metrics.

Hi! I am no longer a member of the privacy team. I would suggest you direct this request to @mpopov or someone on the privacy legal team at WMF.

Aug 26 2024, 6:49 PM · Wikidata Analytics (Kanban), Wikidata, Privacy Engineering

Aug 20 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

As of a few weeks ago I am no longer an employee of WMF, so feel free to change it over whenever @Dzahn!

Aug 20 2024, 3:50 PM · Infrastructure-Foundations, Tools

Aug 13 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Got it — thanks for letting me know!

Aug 13 2024, 2:52 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@KFrancis email sent!

Aug 13 2024, 2:13 AM · Infrastructure-Foundations, Tools

Aug 6 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@Dzahn Got it — yeah, the one I recall signing a few years ago was through the phab UI. I'll wait for Kate to weigh in and send me the email!

Aug 6 2024, 4:51 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@SLyngshede-WMF that sounds like a good plan! Can you link me too the volunteer NDA? I may have already signed it when I was an intern back in 2021, but I'd love to check. As for my personal email, I've updated wikitech and phab to be associated with it (still waiting on gitlab to update), as detailed higher up in this thread. Anything else I need to do there?

Aug 6 2024, 4:39 PM · Infrastructure-Foundations, Tools

Aug 2 2024

Htriedman updated subscribers of T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Just sent @KFrancis an email, and I can tag @CDanis here (picked a random engineer from the infra foundations team page), just to bring this to his attention!

Aug 2 2024, 3:32 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

@Dzahn Great questions! I'm planning on rejoining the Foundation as a contractor under WME in early October. I'll be work on data products in and around the analytics infrastructure.

Aug 2 2024, 2:33 AM · Infrastructure-Foundations, Tools

Aug 1 2024

Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

Yep, done!

Aug 1 2024, 10:41 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.

I would change your email address on the wikitech account to your personal email.

Aug 1 2024, 10:35 PM · Infrastructure-Foundations, Tools
Htriedman created T371644: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman.
Aug 1 2024, 9:46 PM · Infrastructure-Foundations, Tools
Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

I'm about to leave WMF, and I wanted to leave a comment here summarizing the design spec for this desired functionality.

Aug 1 2024, 9:10 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jun 4 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

sounds good to me!

Jun 4 2024, 11:17 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 29 2024

Htriedman added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Reading up on this thread now! I think that idea 1 sounds good and shouldn't be privacy-breaking if we report counts/percentages that are above 250 views. Given that we get something like 600m views per day, that lower threshold accounts for 0.000042% of our traffic.

May 29 2024, 7:02 PM · Experimentation Lab (Data Products Sprint 17), Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

May 28 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

@AndrewTavis_WMDE this looks good to me! As for T365700, like I said there, it seems like the features you're publishing are pertaining to the underlying dataset, not user/editor/reader activity, so there are no direct privacy concerns.

May 28 2024, 4:36 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 24 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

up to you! the only requirement is to redact/filter out that data

May 24 2024, 11:29 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 23 2024

Htriedman added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Hi @AndrewTavis_WMDE! Taking a look at this in conjunction with the existing Data Publication Guidelines.

May 23 2024, 6:02 PM · Wikidata Analytics, Wikidata, Privacy Engineering

May 14 2024

Htriedman added a comment to T362957: Create a dataset for training/evaluating models for summarizing (long) discussions.

I've been actively working on parsing on-wiki discussions in the context of the request a query archive, and I took a few hours to adapt that (hacky but mostly working) code to this task!

May 14 2024, 11:57 PM · Research-Freezer, Wikimedia-Hackathon-2024

May 10 2024

Htriedman added a comment to T362957: Create a dataset for training/evaluating models for summarizing (long) discussions.

I've been doing a lot of wikitext parsing work for the SparQL dataset, including parsing on-wiki conversations. If I can figure that out for (say) the request a query archive, I may take a crack at this adapting the same script to parse RfCs. Will keep everyone updated on this phab task!

May 10 2024, 3:49 PM · Research-Freezer, Wikimedia-Hackathon-2024

Apr 30 2024

Htriedman added a comment to T357613: Measure the reference use and re-use in VE.

Got it! Since this is downstream of an existing event schema, and collects no user identifiers, granular geographic identifiers, or page identifiers, this data collection activity is lower risk. You can go ahead and proceed with building this.

Apr 30 2024, 6:55 PM · WMDE-TechWish-Sprint-2024-06-12, WMDE-TechWish-Sprint-2024-05-29, WMDE-TechWish-Sprint-2024-05-08, WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

Apr 26 2024

Htriedman added a comment to T357613: Measure the reference use and re-use in VE.

Hi @WMDE-Fisch! I'll be conducting this review on phab (at the request of the WMF Legal team until there's a formal agreement between WMF and WMDE). Here's what I originally posted on the L3SC ticket:

Apr 26 2024, 8:25 PM · WMDE-TechWish-Sprint-2024-06-12, WMDE-TechWish-Sprint-2024-05-29, WMDE-TechWish-Sprint-2024-05-08, WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

Apr 25 2024

Htriedman renamed T362805: Build a tool (or tools) to easily visualize differentially-private datasets from Build a tool (or tools) to easily visualize DP datasets to Build a tool (or tools) to easily visualize differentially-private datasets.
Apr 25 2024, 7:23 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024

Apr 17 2024

Htriedman added a project to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page: Wikimedia-Hackathon-2024.

going to investigate the feasibility of this at the WMF Hackathon in a few weeks

Apr 17 2024, 4:46 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a project to T362805: Build a tool (or tools) to easily visualize differentially-private datasets: Toolforge.
Apr 17 2024, 4:44 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman moved T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page from Backlog to Hacking projects on the Wikimedia-Hackathon-2024 board.
Apr 17 2024, 4:44 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman moved T362805: Build a tool (or tools) to easily visualize differentially-private datasets from Backlog to Hacking projects on the Wikimedia-Hackathon-2024 board.
Apr 17 2024, 4:41 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman created T362805: Build a tool (or tools) to easily visualize differentially-private datasets.
Apr 17 2024, 4:39 PM · Technical-Tool-Request, Wikimedia-Hackathon-2024
Htriedman updated subscribers of T344624: Missing contributor stats for Singapore.

I believe that we *do* actually publish the data about Singapore editor numbers — when I query wmf.geoeditors_public_monthly internally, I get ~2000 rows from Singapore, including from March 2024. And Singapore is considered a "Lower risk" country on the Country and Territory Protection List (you can open this tsv which is the official source of truth and ctrl-F "Singapore")...

Apr 17 2024, 3:50 PM · Data-Engineering-Radar, Data-Engineering, Data-Engineering-Wikistats

Mar 28 2024

Htriedman added a comment to T327982: Add cawiki to clickstream dataset.

@VirginiaPoundstone should this be considered under the same auspices as T289532? It may be worthwhile to consider wrapping this up as part of that other task just to limit one-off work that might need to be repeated.

Mar 28 2024, 10:05 PM · Data-Engineering, Data Pipelines, Analytics

Mar 22 2024

Htriedman added a comment to T341139: project-title-country missing US data in recent data, and double quote escaping.

Yes — there's no further work to be done.

Mar 22 2024, 3:49 PM · Experimentation Lab, Data-Engineering

Mar 20 2024

Htriedman added a comment to T360073: Wikistats "Active Editors by Country" does not follow definition for active editors.

@Milimetric FWIW about the weekly dataset — folks from product analytics told me that maintaining and publishing the monthly dataset is important for continuity with existing dataset.

Mar 20 2024, 4:31 PM · Data-Engineering-Radar, Data Pipelines, Data-Engineering, Movement-Insights, Data-Platform
Htriedman added a comment to T360073: Wikistats "Active Editors by Country" does not follow definition for active editors.

@kzimmerman @Milimetric happy to set up a meeting next week to discuss the differences between the DP and non-DP versions of the geoeditors monthly/weekly datasets

Mar 20 2024, 4:27 PM · Data-Engineering-Radar, Data Pipelines, Data-Engineering, Movement-Insights, Data-Platform
Htriedman added a comment to T341139: project-title-country missing US data in recent data, and double quote escaping.

Hi @Ogiermaitre! Thanks for bringing this to our attention, and sorry it's taken so long to respond to you about this — I didn't know this ticket existed until 20 min ago. The US data problem has been fixed.

Mar 20 2024, 4:09 PM · Experimentation Lab, Data-Engineering

Mar 18 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

@Ladsgroup is correct about this — this is already happening on an ad hoc basis in some cases where there may be concerns about editor safety for sensitive material.

Mar 18 2024, 6:05 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Feb 27 2024

Htriedman added a project to T358601: Fix CI/CD issues in the differential-privacy repository: Privacy Engineering.
Feb 27 2024, 4:41 PM · Data-Engineering-Icebox, Data-Engineering, Privacy Engineering
Htriedman created T358601: Fix CI/CD issues in the differential-privacy repository.
Feb 27 2024, 4:40 PM · Data-Engineering-Icebox, Data-Engineering, Privacy Engineering

Feb 15 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

thanks for awareness around your capacity, @DannyS712!

Feb 15 2024, 4:04 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 31 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

There's no huge rush — we've deployed a short-term fix, but it requires some manual updating from WMF developers on a regular basis. If you can get to this in a few weeks that would be fine.

Jan 31 2024, 10:03 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

Hi @DannyS712! Have you made any progress on this?

Jan 31 2024, 8:17 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 24 2024

Htriedman added a comment to T355696: Update canonical_data.countries to reflect new country protection policy.

This task is high priority — the new country and territory protection list policy is out, and we'd like the internal list to reflect that as soon as possible.

Jan 24 2024, 10:08 PM · Movement-Insights

Jan 23 2024

Htriedman created T355696: Update canonical_data.countries to reflect new country protection policy.
Jan 23 2024, 5:07 PM · Movement-Insights

Jan 9 2024

Htriedman added a comment to T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.

I'm assuming that this is all meant to be oversight-level suppression, rather than admin-level (unless we want both as options).

Correct!

Jan 9 2024, 5:10 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams

Jan 8 2024

Htriedman created T354577: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page.
Jan 8 2024, 11:32 PM · Data-Engineering-Radar, Trust and Safety Product Team, Wikimedia-Hackathon-2024, MediaWiki-Revision-deletion, MediaWiki-Page-protection, User-DannyS712, Privacy Engineering, Data-Engineering, Security, Event-Platform, EventStreams
Htriedman added a comment to T353306: [request] consultation for a whitepaper.

@leila Hi! Would love to hear more about what you're thinking with regard to privacy :) Feel free to schedule something with me or continue the conversation here!

Jan 8 2024, 6:17 PM · SecTeam-Processed, Privacy Engineering, Research

Nov 13 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Any updates on this?

Nov 13 2023, 8:11 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 31 2023

Htriedman closed T207171: Have a way to show the most popular pages per country as Resolved.
Oct 31 2023, 11:24 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews
Htriedman closed T299627: Investigate releasing historical top-pageview-per-country data as Resolved.

Update (very late but still necessary): As of Feb 2023, this data request has been completed!

Oct 31 2023, 11:11 PM · Privacy Engineering, Data-Engineering

Oct 19 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Hi @VirginiaPoundstone! Thanks for the detailed questions! I'll try to answer them one by one :)

Oct 19 2023, 8:43 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 13 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

@JFishback_WMF I'll invite you to a meeting about this next week!

Oct 13 2023, 9:06 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 12 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

@JAllemandou Thanks for the kind words! For the moment, yes — let's try to standardize use of the country protection list and try to avoid keeping multiple versions of the list hard-coded in jobs. I will work on the following:

  1. getting my proposed schema reviewed by legal and human rights
  2. implementing the new schema in hive
  3. updating documentation on wikitech
  4. getting this data release onto a DP framework (cc: @Isaac)
Oct 12 2023, 6:30 PM · Data Engineering and Event Platform Team (Sprint 3)
Htriedman updated subscribers of T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

subscribing @Cleo_Lemoisson for visibility

Oct 12 2023, 4:41 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 11 2023

Htriedman added a comment to T348504: [Data Platform] Update referer job to use global country deny list instead of a hard-coded one.

Hi! Thanks for flagging this, @Isaac! Definitely agree that this dataset is a great candidate for differential privacy (DP), which would also likely reduce the minimum publication threshold to <500. I'm happy to start working on that with you — it's a somewhat independent process from the discussion of the country protection list (CPL) and I think this dataset could benefit from it.

Oct 11 2023, 6:29 PM · Data Engineering and Event Platform Team (Sprint 3)

Oct 4 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

@Eevans In that case, I'll change the data model to drop it! Will update this thread when it's done.

Oct 4 2023, 10:06 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE
Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

@Eevans Understood! I'll make that change to the schema soon.

Oct 4 2023, 7:52 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Oct 3 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Hi all! I've made updates to the codebase to better comply with @Eevans' feedback, resulting in a greatly simplified interface. I've listed the following design changes below:

Oct 3 2023, 11:50 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Sep 21 2023

Htriedman updated subscribers of T347104: Application Security Review Request : Fundraise Up scripts for Donatewiki.

@sbassett tagging you in this for visibility

Sep 21 2023, 9:19 PM · secscrum, Security, Application Security Reviews

Sep 20 2023

Htriedman added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Some of them are just artifacts of starting from a fork of one of the legacy services. For example, we'll want to adopt a new (better) convention for keyspace and table naming; Names like "local_group_default_T_dp_pageviews".datawere generated by the RESTBase codebase. Likewise, the "_domain" attribute (which is always set to analytics.wikimedia.org for these services) was done to appease RESTBase, and isn't something we should be perpetuating. Easy changes, mostly cosmetic.

Sep 20 2023, 8:10 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Sep 19 2023

Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

I like this idea! Makes a lot of sense and covers more edge cases than my simpler solution was proposing. Feel free to implement this, and if you do, please write it up in a separate document and share it with me — could be very useful in future cases where we're considering releasing similar sensitive data with a relatively small number of raw data entries in the underlying dataset.

Sep 19 2023, 6:16 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Campaigns-Product-Team (Campaign-Tools-Current-Sprint), Campaign-Registration, CampaignEvents

Sep 15 2023

Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

As for reporting percentages, you can take an example from the new data publication guidelines. We considered how to report percentages in the "Threshold table" section of the policy: https://foundation.wikimedia.org/wiki/Legal:Data_publication_guidelines#Threshold_table

Sep 15 2023, 4:32 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Campaigns-Product-Team (Campaign-Tools-Current-Sprint), Campaign-Registration, CampaignEvents
Htriedman added a comment to T346329: Update visibility rules of aggregated participant responses.

Hi @ifried! Thanks for bringing this up — I wrote the initial set of recommendations for obfuscating event data in these contexts, and know that there are many contexts in which showing "<5" to an event organizer will leak the exact number of responses in that category. It is, at best, a partial fix that will be effective at deterring non-malicious people who have access to reports.

Sep 15 2023, 4:29 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Campaigns-Product-Team (Campaign-Tools-Current-Sprint), Campaign-Registration, CampaignEvents

Sep 6 2023

mfossati awarded T337258: Enable libmamba by default for conda environment solving a 100 token.
Sep 6 2023, 3:52 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Engineering, Data Pipelines

Aug 22 2023

Htriedman updated subscribers of T343855: AQS 2.0 differentially private pageviews deploy API.

Hi all! It's been a few weeks without activity, so I'm following up on this request.

Aug 22 2023, 7:14 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Aug 21 2023

Htriedman added a comment to T340942: Check home/HDFS leftovers of tmtl.io contractors.

Hi @BTullis! All of these Tumult Labs folks were working in more of an advisory role — even if their directories contain some uncommitted changes, you can delete them and remove their user profiles.

Aug 21 2023, 4:41 PM · Data-Engineering
Htriedman added a comment to T344617: Multiple DAGs on platform_eng instance failing on Spark Skein operators with ConnectionError.

Thanks for taking care of this @xcollazo and @BTullis! really appreciate you catching this while I was OOO

Aug 21 2023, 4:38 PM · Data-Platform-SRE

Aug 8 2023

Htriedman created T343855: AQS 2.0 differentially private pageviews deploy API.
Aug 8 2023, 8:13 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE

Aug 3 2023

Htriedman updated the task description for T343304: MakeItSPARQL! - build a UI for the LLM that translates natural language into SPARQL queries for Wikidata.
Aug 3 2023, 5:51 PM · Wikimania-Hackathon-2023, Wikidata Query UI, patch-welcome, Wikidata
Htriedman updated the task description for T343304: MakeItSPARQL! - build a UI for the LLM that translates natural language into SPARQL queries for Wikidata.
Aug 3 2023, 5:39 PM · Wikimania-Hackathon-2023, Wikidata Query UI, patch-welcome, Wikidata

Aug 1 2023

Htriedman added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Hi @odimitrijevic! Here's the gitlab repo I worked on during the documentathon :) https://gitlab.wikimedia.org/htriedman/documentathon-eventstream

Aug 1 2023, 5:48 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Jul 25 2023

Htriedman added a comment to T342487: [Event Platform] Actor performing suppression revealed publicly.

^^agree with the above analysis — if we can selectively remove the performer of suppressions, then this should be considered resolved.

Jul 25 2023, 5:16 PM · Data-Engineering (Sprint 6), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), SecTeam-Processed, Privacy Engineering, Event-Platform, Vuln-Infoleak, Security

Jul 24 2023

Htriedman added a comment to T340149: Review and provide feedback to Guidelines for Data Publication.

Thanks for your comments, @fkaelin! I'll get back to you about the topN pages once we meet about it.

Jul 24 2023, 4:58 PM · Research

Jul 21 2023

Htriedman added a comment to T207171: Have a way to show the most popular pages per country.

@Flomeier85 if you have any questions at all feel free to post them here or reach out to me via email at htriedman@wikimedia.org :)

Jul 21 2023, 10:07 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews

Jul 20 2023

Htriedman added a comment to T340149: Review and provide feedback to Guidelines for Data Publication.

I know @Isaac has given feedback on this doc — @fkaelin any additional comments?

Jul 20 2023, 7:00 PM · Research

Jul 17 2023

Htriedman added a comment to T315676: Add DP cookie for pageview filtering.

@Vgutierrez this feature has been working as expected, and this ticket can be closed!

Jul 17 2023, 2:50 PM · SRE, Traffic

Jul 14 2023

Htriedman added a comment to T341907: Release datasets in support of Wikimedia-related AI modeling.

Isaac and I spent some time brainstorming about this last month. Here's a google doc with a bunch of existing ideas in it!

Jul 14 2023, 9:07 PM · Research, Epic

Jul 11 2023

Htriedman added a comment to T334851: Define a procedure/pattern to populate test environments.

I would be strongly in favor of using mock data over synthetic data, at least for the moment. We should only have an explicit preference for synthetic data if there's a real need for the underlying statistical distribution of the fake data to mirror that of the real data. If it's just for performance testing, that shouldn't be necessary.

Jul 11 2023, 4:30 PM · Catalyst (Prototype leftovers 🍱), SecTeam-Processed, Privacy Engineering, serviceops-radar, WMF-Architecture-Team, Platform Engineering, Release-Engineering-Team, API Platform, AQS2.0

Jul 5 2023

Htriedman added a comment to T335892: Get stats on Gadgets and Users scripts loading third-party resources.

Definitely would be pro-overriding the user-agent for fontcdn (and cdnjs) — that would make it significantly easier to argue that they should be considered ok to allowlist for third-party resources.

Jul 5 2023, 6:37 PM · WMF-General-or-Unknown, affects-Miraheze, SecTeam-Processed, Privacy Engineering, tech-decision-forum

Jun 30 2023

Htriedman added a comment to T316600: Broken DAG Error when trying to import Gitlab .tgz file into airflow.

agree! we can close this out

Jun 30 2023, 4:26 PM · Data Pipelines

Jun 29 2023

Htriedman added a comment to T340189: Images of private wikis are publicly accessible if attacker knows the URL or the filename.

We definitely should run a hadoop query (or a set of queries) to get a sense of access over the past 90 days. I pulled database codes / domain names from canonical_data.wikis where status = "open" and visibility = "private" and got the following list:

Jun 29 2023, 9:30 PM · Privacy Engineering, SecTeam-Processed, Vuln-Infoleak, SRE-swift-storage, Security, Security-Team
Htriedman added a comment to T331416: The nsfw model hangs in predict() after moving to Kserve 0.10.

@elukey Feel free to remove it from Lift Wing for the moment! Thanks for letting me know.

Jun 29 2023, 4:25 PM · Machine-Learning-Team

Jun 27 2023

nettrom_WMF awarded T337258: Enable libmamba by default for conda environment solving a Yellow Medal token.
Jun 27 2023, 8:45 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Engineering, Data Pipelines

Jun 16 2023

nshahquinn-wmf awarded T337258: Enable libmamba by default for conda environment solving a Yellow Medal token.
Jun 16 2023, 2:14 AM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Engineering, Data Pipelines

May 30 2023

Htriedman added a comment to T334851: Define a procedure/pattern to populate test environments.
  1. What do you consider our next steps would be with this approach (using sample data as the initial source)? I ask that because you mention we shouldn’t use it yet in production environment with private or sensitive data so I guess we need to work more on it (to anonymize, for example). It’s not the case at this moment but it’s something we should explore for the future
May 30 2023, 5:29 PM · Catalyst (Prototype leftovers 🍱), SecTeam-Processed, Privacy Engineering, serviceops-radar, WMF-Architecture-Team, Platform Engineering, Release-Engineering-Team, API Platform, AQS2.0
Htriedman closed T337321: Automating pulling schemas from eventschema to datahub as Invalid.

See this task instead: https://phabricator.wikimedia.org/T318863

May 30 2023, 5:03 PM · Data-Engineering
Htriedman closed T280385: Apache Beam go prototype code for DP evaluation, a subtask of T267283: Evaluate a differentially private solution to release wikipedia's project-title-country data, as Resolved.
May 30 2023, 4:28 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release
Htriedman closed T280385: Apache Beam go prototype code for DP evaluation as Resolved.
May 30 2023, 4:28 PM · Research-Freezer, Data-Engineering, Privacy Engineering, Privacy, Data-release
Htriedman closed T282195: ApacheBeam prototype for DP noise addition with pageview privacy units on top of Spark, a subtask of T267283: Evaluate a differentially private solution to release wikipedia's project-title-country data, as Resolved.
May 30 2023, 4:27 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release
Htriedman closed T282195: ApacheBeam prototype for DP noise addition with pageview privacy units on top of Spark as Resolved.
May 30 2023, 4:27 PM · Research-Freezer, Data-Engineering-Radar, Privacy Engineering, Privacy, Data-release

May 23 2023

Htriedman created T337321: Automating pulling schemas from eventschema to datahub.
May 23 2023, 3:21 PM · Data-Engineering

May 22 2023

Htriedman added a comment to T207171: Have a way to show the most popular pages per country.

I don't quite get this. If I query this URL I thought I get views from Romania drilled-down per project and page (see "FCV_Farul_Constanța", present on both enwiki and rowiki). Is this not true or am I missing the defition of the splits?

May 22 2023, 9:54 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews
Htriedman added a comment to T267283: Evaluate a differentially private solution to release wikipedia's project-title-country data.

Following up on this as the primary person working on this project for the past 18 months with some details of how this dataset is different from the existing API data:

May 22 2023, 7:00 PM · Data-Engineering, Research, Privacy Engineering, Privacy, Data-release
Htriedman added a comment to T207171: Have a way to show the most popular pages per country.

Hi @Strainu! I was the primary person who worked on implementing this data release for the past 18 months and can describe how this data is different from the API.

May 22 2023, 6:59 PM · Data-Engineering, Data-Engineering-Wikistats, Privacy Engineering, Inuka-Team, Language-strategy, Tool-Pageviews
Htriedman added a project to T337258: Enable libmamba by default for conda environment solving: Data-Engineering.
May 22 2023, 5:55 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Engineering, Data Pipelines
Htriedman created T337258: Enable libmamba by default for conda environment solving.
May 22 2023, 5:36 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Engineering, Data Pipelines

May 16 2023

Htriedman added a comment to T334851: Define a procedure/pattern to populate test environments.

@Sfaci I ran this on stat1006 using the conda-created-stacked, conda-activate-stacked, and conda-deactivate-stacked built-in scripts. Are you using stat machines and conda?

May 16 2023, 4:00 PM · Catalyst (Prototype leftovers 🍱), SecTeam-Processed, Privacy Engineering, serviceops-radar, WMF-Architecture-Team, Platform Engineering, Release-Engineering-Team, API Platform, AQS2.0