⚓ T368326 Update Metrics Platform Client Libraries to accept experiment membership

Subject	Repo	Branch	Lines +/-
Create Tests: Add tests for MediaWikiMetricsClientIntegation#isCurrentUserEnrolled()	mediawiki/extensions/EventLogging	master	+86 -31
MediaWikiMetricsClientIntegration: quick fix for isCurrentUserEnrolled() function	mediawiki/extensions/EventLogging	master	+2 -2
T368326-update-metrics-platform-to-accept-experiment-membership	mediawiki/extensions/EventLogging	master	+164 -6
Adding a new 'experiments' fragment to collect data about which experiments is a subject enrolled in	schemas/event/secondary	master	+65 -0

Status	Assigned	Task
Resolved	Sfaci	T366949 MPIC: Add stream name to forms/database/api
Open	None	T366807 [EPIC] Update Metrics Platform Client Libraries to accept instrument name
Resolved	phuedx	T366827 Update the Metrics Platform JS Client Library API to talk about Instrument Name
Open	cjming	T370880 [EPIC] FY 24/25 SDS 2.1.7 \| Alpha Release of Instrument Configuration System (MPIC)
Resolved	cjming	T366802 Update Metrics Platform Base Schemas to include instrument name
Resolved	JEbe-WMF	T368326 Update Metrics Platform Client Libraries to accept experiment membership
Open	None	T374744 Update Metrics Platform Java library to include experiment enrollment
Resolved	Sgs	T371498 Metrics Platform Integration: PoC for tracking impressions of the Community Updates homepage module
Open	None	T382469 EPIC: update documentation for experiment enrollment

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 24 2024, 9:59 PM

mpopov mentioned this in T366807: [EPIC] Update Metrics Platform Client Libraries to accept instrument name.Jun 24 2024, 10:00 PM

What if the user is enrolled in multiple, non-overlapping (hopefully) experiments that are running on the same page? Should we rather add an experiments property, which is an array of experiment objects? If so, how would that impact querying the data?

That's a great point!

I did some tests with two different ways to model this data: https://gist.github.com/bearloga/b0ca0b3ebd7ca427beeb68fe920a66e7 and what would Spark SQL and Presto queries look like in each case.

I'll check with my team if there's a strong preference. (Slack thread)

Is there a preference for version 1 or 2 from an engineering perspective?

Andrew brought up an excellent point (in the same thread) which is that https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Complex_array_element_and_map_value_type_evolution_is_not_well_supported so if we did go the array-of-objects (version 1) route, we MAY want array<map<string,string>> not array<struct<id:int,name:string,group:string>> in case there is anything we would need to add later. Like, if we needed to include a new field for each experiment it'd be easy with the former and impossible with the latter.

@phuedx: I see in the Figma designs that instruments/experiments will have user-provided unique names with machine-readable versions auto-generated. Will those instruments/experiments have numeric IDs in the database that also uniquely identify them?

Maybe:

For MPIC-managed experiments we can record numeric ID and the unique name (to avoid joining against MPIC's table of experiments to get the name from ID)
For non-MPIC-managed experiments (e.g. the A/B test that Android team is planning) we would omit the numeric ID (there is none) and have the client fill in the experiment name

Question for @MNeisler & @nettrom_WMF: for MPIC-managed experiments, would you prefer the name as entered by the user in MPIC (which would be visible in MPIC) OR the machine-readable version generated by MPIC? Consider which version you might prefer to use in WHERE statements for selecting interaction data from a specific experiment:

WHERE experiment.name = 'Personalized search results (WE 3.1.5)'

-- OR --

WHERE experiment.name = 'personalized_search_results_we_3-1-5'

(Totally guessing at what the machine-readable version would look like.)

In T368326#9924054, @mpopov wrote:

@phuedx: I see in the Figma designs that instruments/experiments will have user-provided unique names with machine-readable versions auto-generated. Will those instruments/experiments have numeric IDs in the database that also uniquely identify them?

Maybe:

For MPIC-managed experiments we can record numeric ID and the unique name (to avoid joining against MPIC's table of experiments to get the name from ID)

For non-MPIC-managed experiments (e.g. the A/B test that Android team is planning) we would omit the numeric ID (there is none) and have the client fill in the experiment name

Agreed. Name is required. ID is optional.

To your question to @MNeisler and @nettrom_WMF: Should we include both names here – the machine- and human-readable ones?

@phuedx: I reviewed the modeling approaches in PA team sharing (notes w/ link to transcript & recording) and Morten brought up a great point which is the computational cost of using virtual table generation (LATERAL VIEW EXPLODE() in Spark SQL and CROSS JOIN UNNEST() in Presto) for version 1 which reminded him of evaluating custom data approach of monoschema MP.

There's some preference for the neatness of querying with version 2 (which I agree with). I think the performance considerations about version 1 are very valid and both of these make version 2 a more favorable candidate at the moment. (But still waiting for more feedback from Megan & Morten before making the final selection.)

I don't think we need to invest time into benchmarking these unless there is a very strong preference for version 1 from engineering perspective and we need to demonstrate practical differences in query execution time when the dataset is, say, 100K events. (The hypothesis is that version 1 would be substantially slower / more expensive to query.)

In T368326#9930952, @phuedx wrote:

To your question to @MNeisler and @nettrom_WMF: Should we include both names here – the machine- and human-readable ones?

I suppose the only issue (if you'd even want to even call it that) with #2 is that we would need to update the schema if we wanted to capture new information about experiments., e.g:

In the comment above, I suggested that we capture both machine- and human-readable names of an experiment. This would require the experiments struct be updated to

experiments: struct<
  enrolled: array<string>,
  assigned: map<string, string>,
  names: map<string, string>
>

I really don't see this (needing to update schemas) as a huge issue from an engineering perspective.

Collected various discussions & points into this decision brief (viewable by public). Will follow up here when the decision is made.

SNowick_WMF subscribed.Jul 15 2024, 5:43 PM

mpopov updated the task description. (Show Details)Jul 17 2024, 1:27 AM

@phuedx @VirginiaPoundstone: Okie dokie, I updated the task description with the data model specification and updated the requirements based on a guess of what is involved.

phuedx mentioned this in T369847: Setup basic send and receive wiring between a MW instance and a Statsig cloud instance.Jul 23 2024, 4:12 PM

mpopov mentioned this in T366627: [MPIC] Analyse risk of potential performance issues with static approach to stream configuration.Jul 30 2024, 4:13 PM

Sfaci claimed this task.Aug 23 2024, 1:56 PM

Sfaci added a project: Experimentation Lab (Data products Sprint 18).

Sfaci moved this task from Sprint Backlog to In Process on the Experimentation Lab (Data products Sprint 18) board.

Sfaci mentioned this in T372585: [Sprint 18 GOAL] MPIC Alpha: Refactor client libraries for Monotable.

Sfaci set the point value for this task to 5.Aug 27 2024, 8:26 AM

Change #1067306 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[schemas/event/secondary@master] Adding a new 'experiments' fragment to collect data about which experiments is a subject enrolled in

https://gerrit.wikimedia.org/r/1067306

gerritbot added a project: Patch-For-Review.Aug 27 2024, 10:05 AM

cjming updated the task description. (Show Details)Aug 27 2024, 7:45 PM

cjming added a parent task: T370880: [EPIC] FY 24/25 SDS 2.1.7 | Alpha Release of Instrument Configuration System (MPIC).Aug 27 2024, 8:15 PM

Sfaci updated the task description. (Show Details)Aug 29 2024, 9:59 AM

Sfaci reassigned this task from Sfaci to JEbe-WMF.Aug 29 2024, 6:08 PM

Sfaci subscribed.

VirginiaPoundstone mentioned this in T372584: [Sprint 18 GOAL] MPIC Alpha: refactor PHP user bucketing function.Aug 30 2024, 3:57 PM

phuedx added a parent task: T366802: Update Metrics Platform Base Schemas to include instrument name.Sep 3 2024, 11:38 AM

phuedx mentioned this in T366802: Update Metrics Platform Base Schemas to include instrument name.

cjming mentioned this in T373715: MPIC: Create a QA plan for testing the alpha release.Sep 4 2024, 5:07 AM

• WDoranWMF moved this task from In Process to BLOCKED on the Experimentation Lab (Data products Sprint 18) board.Sep 4 2024, 4:25 PM

Change #1067306 merged by jenkins-bot:

[schemas/event/secondary@master] Adding a new 'experiments' fragment to collect data about which experiments is a subject enrolled in

https://gerrit.wikimedia.org/r/1067306

phuedx moved this task from BLOCKED to In Process on the Experimentation Lab (Data products Sprint 18) board.Sep 5 2024, 9:23 AM

Maintenance_bot removed a project: Patch-For-Review.Sep 5 2024, 9:30 AM

Sfaci updated the task description. (Show Details)Sep 5 2024, 1:34 PM

VirginiaPoundstone edited projects, added Experimentation Lab (Data Products Sprint 19); removed Experimentation Lab (Data products Sprint 18).Sep 5 2024, 7:05 PM

VirginiaPoundstone moved this task from Sprint Backlog to In Process on the Experimentation Lab (Data Products Sprint 19) board.

VirginiaPoundstone triaged this task as High priority.Sep 9 2024, 7:04 PM

Sfaci updated the task description. (Show Details)Sep 10 2024, 8:16 AM

Sfaci updated the task description. (Show Details)

• apaskulin updated the task description. (Show Details)Sep 11 2024, 11:05 PM

@VirginiaPoundstone is it ok to spin off the corresponding Java lib updates AC into another task for a later sprint? since we narrowed scope to MW as the primary use case for Growth's experiment, we're focused on the JS/PHP libraries for now.

Yes.

We may want to also include the rollback required for the char count when we do the Java library.

cjming mentioned this in T374744: Update Metrics Platform Java library to include experiment enrollment.Sep 13 2024, 9:40 PM

cjming updated the task description. (Show Details)

cjming added a subtask: T374744: Update Metrics Platform Java library to include experiment enrollment.

Sfaci updated the task description. (Show Details)Sep 16 2024, 12:50 PM

Sfaci updated the task description. (Show Details)Sep 16 2024, 12:53 PM

Sfaci updated the task description. (Show Details)Sep 16 2024, 1:08 PM

Sfaci mentioned this in T374840: Update Metrics Platform PHP library to include experiment enrollment.Sep 16 2024, 1:41 PM

Sfaci updated the task description. (Show Details)Sep 16 2024, 3:10 PM

Change #1073484 had a related patch set uploaded (by Jennifer Ebe; author: Jennifer Ebe):

[mediawiki/extensions/EventLogging@master] T368326-update-metrics-platform-to-accept-experiment-membership

https://gerrit.wikimedia.org/r/1073484

gerritbot added a project: Patch-For-Review.Sep 17 2024, 4:10 PM

JEbe-WMF moved this task from In Process to Needs Review on the Experimentation Lab (Data Products Sprint 19) board.Sep 18 2024, 7:35 AM

Apart from the EventLogging change above, there is also a metrics-platform one ready for review: https://ggitlab.wikimedia.org/repos/data-engineering/metrics-platform/-/merge_requests/71

sfaci updated https://gitlab.wikimedia.org/repos/data-engineering/metrics-platform/-/merge_requests/71

T368326-update-metrics-platform-to-accept-experiment-membership

phuedx merged https://gitlab.wikimedia.org/repos/data-engineering/metrics-platform/-/merge_requests/71

T368326-update-metrics-platform-to-accept-experiment-membership

Sfaci updated the task description. (Show Details)Sep 20 2024, 10:31 AM

Change #1073484 had a related patch set uploaded (by Santiago Faci; author: Jennifer Ebe):

[mediawiki/extensions/EventLogging@master] T368326-update-metrics-platform-to-accept-experiment-membership

https://gerrit.wikimedia.org/r/1073484

VirginiaPoundstone edited projects, added Experimentation Lab (Data Products Sprint 20 🎯); removed Experimentation Lab (Data Products Sprint 19).Sep 27 2024, 3:01 PM

VirginiaPoundstone moved this task from Sprint Backlog to Needs Review on the Experimentation Lab (Data Products Sprint 20 🎯) board.

Milimetric moved this task from Needs Review to In Process on the Experimentation Lab (Data Products Sprint 20 🎯) board.Oct 2 2024, 4:08 PM

phuedx added a subtask: T371498: Metrics Platform Integration: PoC for tracking impressions of the Community Updates homepage module.Oct 4 2024, 5:31 AM

phuedx mentioned this in T371498: Metrics Platform Integration: PoC for tracking impressions of the Community Updates homepage module.Oct 4 2024, 5:40 AM

phuedx moved this task from In Process to Needs Review on the Experimentation Lab (Data Products Sprint 20 🎯) board.Oct 9 2024, 12:35 PM

• apaskulin updated the task description. (Show Details)Oct 10 2024, 11:43 PM

phuedx moved this task from Needs Review to To Deploy on the Experimentation Lab (Data Products Sprint 20 🎯) board.Oct 14 2024, 1:21 PM

Change #1073484 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] T368326-update-metrics-platform-to-accept-experiment-membership

https://gerrit.wikimedia.org/r/1073484

ReleaseTaggerBot added a project: MW-1.43-notes (1.43.0-wmf.27; 2024-10-15).Oct 14 2024, 2:00 PM

Change #1080053 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[mediawiki/extensions/EventLogging@master] MediaWikiMetricsClientIntegration: quick fix for isCurrentUserEnrolled() function

https://gerrit.wikimedia.org/r/1080053

Change #1080053 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] MediaWikiMetricsClientIntegration: quick fix for isCurrentUserEnrolled() function

https://gerrit.wikimedia.org/r/1080053

Change #1080068 had a related patch set uploaded (by Jennifer Ebe; author: Jennifer Ebe):

[mediawiki/extensions/EventLogging@master] Create-Tests-For-Metrics-Platform

https://gerrit.wikimedia.org/r/1080068

phuedx moved this task from To Deploy to Needs Review on the Experimentation Lab (Data Products Sprint 20 🎯) board.Oct 15 2024, 12:41 PM

phuedx moved this task from Needs Review to To Deploy on the Experimentation Lab (Data Products Sprint 20 🎯) board.Oct 18 2024, 11:05 AM

Change #1080068 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] Create Tests: Add tests for MediaWikiMetricsClientIntegation#isCurrentUserEnrolled()

https://gerrit.wikimedia.org/r/1080068

ReleaseTaggerBot edited projects, added MW-1.43-notes (1.43.0-wmf.28; 2024-10-22); removed MW-1.43-notes (1.43.0-wmf.27; 2024-10-15).Oct 18 2024, 12:00 PM

phuedx updated the task description. (Show Details)Oct 23 2024, 11:49 AM

• apaskulin subscribed.Oct 24 2024, 11:23 PM

Milimetric moved this task from To Deploy to Done on the Experimentation Lab (Data Products Sprint 20 🎯) board.Nov 7 2024, 6:20 PM

VirginiaPoundstone closed this task as Resolved.Nov 7 2024, 8:47 PM

It looks like the documentation updates specified in the task description haven't been done, assuming this change includes user-facing library changes

mpopov mentioned this in T378115: Implement A/B test bucketing for mobile search recommendation.Nov 22 2024, 6:14 PM

cjming mentioned this in T382469: EPIC: update documentation for experiment enrollment.Thu, Dec 19, 2:56 AM

cjming added a subtask: T382469: EPIC: update documentation for experiment enrollment.

Update Metrics Platform Client Libraries to accept experiment membership
Closed, ResolvedPublic5 Estimated Story Points
Actions

Description

Technical details

Schema updates

JS Client library

PHP Client library

Requirements

Details

Related Objects
Search...

Event Timeline

Update Metrics Platform Client Libraries to accept experiment membershipClosed, ResolvedPublic5 Estimated Story PointsActions

Description

Technical details

Schema updates

JS Client library

PHP Client library

Requirements

Details

Related ObjectsSearch...

Event Timeline

Update Metrics Platform Client Libraries to accept experiment membership
Closed, ResolvedPublic5 Estimated Story Points
Actions

Related Objects
Search...