Page MenuHomePhabricator

Community updates module: apply instrumentation specification
Closed, ResolvedPublic

Description

In T371498 an initial logger using metrics-platform has been introduced. It only logs impressions of the module passing its name as action_source="community-updates". This task is to add the missing relevant data defined in the instrumentation specification, mainly action_subtype="update-friendly-name"

Acceptance criteria

  • An update friendly name is added to both click and impression events
  • Clicks on the call to action are logged

Event Timeline

Thanks for providing the detailed specification @nettrom_WMF. I have a some questions:

  1. In the instrument specification the field used to capture the friendly name is action_subtype, the documentation for that field says it is used to classify actions, I'm not sure if the update friendly name is a good candidate for such classification. I wonder how is it different from capturing it under action_context? I'm imaging we introduce a second link in the future (eg: wikitext is allowed in the description), how would we differentiate the source link that originated the event?
  2. Since we're deliberately limiting ourselves to use product_metrics/web/base schema, I wonder if we could use action_context to capture arbitrary data as we've done in the past with action_data: "taskTypes=link-recommendation;taskCount=161;topics="?
  3. For the same reason than in (2), we're not yet adding any session identifier or token that is shared across streams to allow us to connect them, afair in the "legacy intrumentation" that was named the homepage_pageview_token. Should we aim to achieve a similar status quo and add the GE generated homepage pageview token to the events in the mediawiki.product_metrics.homepage_module_interaction stream?
Sgs triaged this task as Medium priority.Sep 6 2024, 3:31 PM
Sgs moved this task from Inbox to Up Next (estimated tasks) on the Growth-Team board.

Thanks for providing the detailed specification @nettrom_WMF. I have a some questions:

  1. In the instrument specification the field used to capture the friendly name is action_subtype, the documentation for that field says it is used to classify actions, I'm not sure if the update friendly name is a good candidate for such classification. I wonder how is it different from capturing it under action_context? I'm imaging we introduce a second link in the future (eg: wikitext is allowed in the description), how would we differentiate the source link that originated the event?

Having looked at a couple of other MP-based instrumentation specifications, I've updated the our specification to instead use action_context. It's a much better field for this, thanks for noticing and bringing it up!

  1. Since we're deliberately limiting ourselves to use product_metrics/web/base schema, I wonder if we could use action_context to capture arbitrary data as we've done in the past with action_data: "taskTypes=link-recommendation;taskCount=161;topics="?

I'm not sure if the Language team went through and did that with their instrumentation, but they did propose a new schema fragment to capture translation context in T369687 instead of using action_context for a JSON-blob with all of it. From my perspective, I'm interested in some kind of identifier that allows me to understand which specific Community Update a user saw. I don't think much additional information is necessary.

  1. For the same reason than in (2), we're not yet adding any session identifier or token that is shared across streams to allow us to connect them, afair in the "legacy intrumentation" that was named the homepage_pageview_token. Should we aim to achieve a similar status quo and add the GE generated homepage pageview token to the events in the mediawiki.product_metrics.homepage_module_interaction stream?

I'm not sure we need to? MP has performer.pageview_id (1.2.0.yaml L446) It seems we might need a particular stream configuration for it to be present, though. That's something for us to verify with the folks in Data Products, so I'm pinging @phuedx and @mforns to get it on their radar.

I'm not sure we need to? MP has performer.pageview_id (1.2.0.yaml L446) It seems we might need a particular stream configuration for it to be present, though. That's something for us to verify with the folks in Data Products, so I'm pinging @phuedx and @mforns to get it on their radar.

Is the homepage_pageview_token property in the older instruments different from the pageview token generated by mw.user.getPageviewToken()? How so? What's it used for?

Sgs moved this task from Up Next (estimated tasks) to FY2024-25 Q1 Sprint 5 on the Growth-Team board.
Sgs edited projects, added Growth-Team (FY2024-25 Q1 Sprint 5); removed Growth-Team.
Sgs moved this task from Incoming to Doing on the Growth-Team (FY2024-25 Q1 Sprint 5) board.
Sgs moved this task from Backlog to In Process on the WMF-SDS 2 Sprinthackular 2024 board.

I'm not sure we need to? MP has performer.pageview_id (1.2.0.yaml L446) It seems we might need a particular stream configuration for it to be present, though. That's something for us to verify with the folks in Data Products, so I'm pinging @phuedx and @mforns to get it on their radar.

Is the homepage_pageview_token property in the older instruments different from the pageview token generated by mw.user.getPageviewToken()? How so? What's it used for?

homepage_pageview_token is generated ad-hoc in SpecialHomepage.php#266 on each page render. It serves the purpose of relating the events from analytics/legacy/homepagevisit with analytics/legacy/homepagemodule and others. Afaiu performer.pageview_id would provide the same capability but that would require to move the usage of the analytics/legacy/homepagevisitto a MP client as well, which I'm not sure is directly possible.

Thanks for providing the detailed specification @nettrom_WMF. I have a some questions:

  1. In the instrument specification the field used to capture the friendly name is action_subtype, the documentation for that field says it is used to classify actions, I'm not sure if the update friendly name is a good candidate for such classification. I wonder how is it different from capturing it under action_context? I'm imaging we introduce a second link in the future (eg: wikitext is allowed in the description), how would we differentiate the source link that originated the event?

Having looked at a couple of other MP-based instrumentation specifications, I've updated the our specification to instead use action_context. It's a much better field for this, thanks for noticing and bringing it up!

Updated accordingly.

  1. Since we're deliberately limiting ourselves to use product_metrics/web/base schema, I wonder if we could use action_context to capture arbitrary data as we've done in the past with action_data: "taskTypes=link-recommendation;taskCount=161;topics="?

I'm not sure if the Language team went through and did that with their instrumentation, but they did propose a new schema fragment to capture translation context in T369687 instead of using action_context for a JSON-blob with all of it. From my perspective, I'm interested in some kind of identifier that allows me to understand which specific Community Update a user saw. I don't think much additional information is necessary.

I have been highly discouraged from taking this approach by @mforns and @phuedx so I'm avoiding it for now, action_context=friendly_update_title. One caveat is we don't have anything that identifies uniquely a "community update". If the title is updated but it's an update on the same announcement, we'll get two different contexts.

  1. For the same reason than in (2), we're not yet adding any session identifier or token that is shared across streams to allow us to connect them, afair in the "legacy intrumentation" that was named the homepage_pageview_token. Should we aim to achieve a similar status quo and add the GE generated homepage pageview token to the events in the mediawiki.product_metrics.homepage_module_interaction stream?

I'm not sure we need to? MP has performer.pageview_id (1.2.0.yaml L446) It seems we might need a particular stream configuration for it to be present, though. That's something for us to verify with the folks in Data Products, so I'm pinging @phuedx and @mforns to get it on their radar.

I also think performer.pageview_id is a replacement candidate, the problem is this value would only be sent through the new stream mediawiki.product_metrics.homepage_module_interaction using the core schema but not through the HomepageVisit stream which uses analytics/legacy/homepagevisit. I understand this was relevant to relate information from both schemas when querying. I'm not sure if for the Community updates metric this remains relevant, but it seems it would? @nettrom_WMF

Change #1072534 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] CommunityUpdates: use the update title as the action context for instrument

https://gerrit.wikimedia.org/r/1072534

I have been highly discouraged from taking this approach by @mforns and @phuedx so I'm avoiding it for now, action_context=friendly_update_title. One caveat is we don't have anything that identifies uniquely a "community update". If the title is updated but it's an update on the same announcement, we'll get two different contexts.

So that I understand complete how this works: the value of action_context is the title of the update, with some "standardization" of the text (e.g. all lowercase, underscores instead of spaces, special characters removed)? And that this means that if someone changes the update to use a new icon, but nothing else, then the value doesn't change. And if they change the title, but nothing else, the value does change. Did I get that right?

I also think performer.pageview_id is a replacement candidate, the problem is this value would only be sent through the new stream mediawiki.product_metrics.homepage_module_interaction using the core schema but not through the HomepageVisit stream which uses analytics/legacy/homepagevisit. I understand this was relevant to relate information from both schemas when querying. I'm not sure if for the Community updates metric this remains relevant, but it seems it would? @nettrom_WMF

This is not an issue for the Community Updates experiment, none of our metrics require joining multiple schemas on the same token. The metrics are either rates (compared between the two experiment groups), based on user ID (e.g. campaign sign-up rate), or comes from the same instrument (CTR).

That being said, this is an important conceptual issue that we'll need to keep in mind for future Metrics Platform work. In addition to the Growth team setting homepage_pageview_token in HomepageVisit and then reusing it across HomepageModule and HelpPanel, the EditAttemptStep schema is both server- and client-side with a token field (I'm not going to look up the name, editing_session_id maybe?) and shares the token with VisualEditorFeatureUse.

I think there are valid reasons to have a server-side event as the first event (e.g. a Homepage visit in our case) and then needing a consistent identifier across events to be able to determine user actions in a funnel. So I'm going to ping @phuedx on this and note it as a conceptual nugget of learning from this work.

(Moved the task back to In Progress since the patch is failing CI tests).

I have been highly discouraged from taking this approach by @mforns and @phuedx so I'm avoiding it for now, action_context=friendly_update_title. One caveat is we don't have anything that identifies uniquely a "community update". If the title is updated but it's an update on the same announcement, we'll get two different contexts.

So that I understand complete how this works: the value of action_context is the title of the update, with some "standardization" of the text (e.g. all lowercase, underscores instead of spaces, special characters removed)? And that this means that if someone changes the update to use a new icon, but nothing else, then the value doesn't change. And if they change the title, but nothing else, the value does change. Did I get that right?

Yes, that's accurate

I also think performer.pageview_id is a replacement candidate, the problem is this value would only be sent through the new stream mediawiki.product_metrics.homepage_module_interaction using the core schema but not through the HomepageVisit stream which uses analytics/legacy/homepagevisit. I understand this was relevant to relate information from both schemas when querying. I'm not sure if for the Community updates metric this remains relevant, but it seems it would? @nettrom_WMF

This is not an issue for the Community Updates experiment, none of our metrics require joining multiple schemas on the same token. The metrics are either rates (compared between the two experiment groups), based on user ID (e.g. campaign sign-up rate), or comes from the same instrument (CTR).

Alright, performer.pageview_id is already present in the events.

Change #1072534 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] CommunityUpdates: use the update title as the action context for instrument

https://gerrit.wikimedia.org/r/1072534

Etonkovidova subscribed.

Moving to Test in production for final review by @nettrom_WMF.

  • impression
{"action":"impression","experiments":{"assigned":{"growth-experiments":"control"},"enrolled":["growth-experiments"]},"action_source":"community-updates","action_context":"duis_aute_irure_dolor_in_reprehenderit_in_volupta","$schema":"/analytics/product_metrics/web/base/1.3.0","mediawiki":{"database":"cswiki","site_content_language":"cs"},"page":{"content_language":"cs"},"agent":{"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"},"performer":{"session_id":"b54e2f9d24b042aa5569","active_browsing_session_token":"713f4d2db31d56c1020d","name":"ET13","is_bot":false,"is_logged_in":true,"edit_count_bucket":"100-999 edits","groups":["sysop","*","user","autoconfirmed"],"registration_dt":"2020-02-25T20:22:55.000Z","is_temp":false,"language":"cs","pageview_id":"21b072194ff9f4152615"},"sample":{"unit":"pageview","rate":1},"dt":"2024-09-24T00:24:11.455Z","meta":{"stream":"mediawiki.product_metrics.homepage_module_interaction","domain":"cs.wikipedia.beta.wmflabs.org","id":"4375f7bb-4153-4c9d-817f-6d4bbca919a8","dt":"2024-09-24T00:24:13.401Z","request_id":"53626140-7a0b-11ef-91a0-e3c33040138b","topic":"eqiad.mediawiki.product_metrics.homepage_module_interaction","partition":0,"offset":118},"http":{"request_headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"}}}
  • click
{"action":"click","experiments":{"assigned":{"growth-experiments":"control"},"enrolled":["growth-experiments"]},"action_subtype":"community-updates-cta","action_source":"community-updates","action_context":"duis_aute_irure_dolor_in_reprehenderit_in_volupta","$schema":"/analytics/product_metrics/web/base/1.3.0","mediawiki":{"database":"cswiki","site_content_language":"cs"},"page":{"content_language":"cs"},"agent":{"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"},"performer":{"session_id":"[...]","active_browsing_session_token":"[...]","name":"ET13","is_bot":false,"is_logged_in":true,"edit_count_bucket":"100-999 edits","groups":["sysop","*","user","autoconfirmed"],"registration_dt":"2020-02-25T20:22:55.000Z","is_temp":false,"language":"cs","pageview_id":"18409e498836ee847b45"},"sample":{"unit":"pageview","rate":1},"dt":"2024-09-23T23:54:58.532Z","meta":{"stream":"mediawiki.product_metrics.homepage_module_interaction","domain":"cs.wikipedia.beta.wmflabs.org","id":"9638fe7f-b618-4a8c-8864-c2aa95161c3a","dt":"2024-09-23T23:55:05.404Z","request_id":"[...],"topic":"eqiad.mediawiki.product_metrics.homepage_module_interaction","partition":0,"offset":116},"http":{"request_headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"}}}
  • no homepage_pageview_token (was present in /analytics/legacy/homepagemodule/1.6.6 schema for `"module":"community-updates", see https://phabricator.wikimedia.org/T371683#10107787)
  • "action_context":"duis_aute_irure_dolor_in_reprehenderit_in_volupta" is the title in Community Update module:

Screen Shot 2024-09-23 at 5.31.14 PM.png (472×868 px, 63 KB)

Thanks for testing!

That's correct, in the new instruments we will rely on the pageview_id property for the goal of relating streams.

  • "action_context":"duis_aute_irure_dolor_in_reprehenderit_in_volupta" is the title in Community Update module:

That seems correct.

Good catch, added.

Thx, @Sgs for addressing all my questions!

Checked in wmf.24 - works as expected. Closing as Resolved.