Page MenuHomePhabricator

Instrumentation for Suggested Investigations
Closed, ResolvedPublic1 Estimated Story Points

Description

We want to set up an instrumentation/measurement plan that covers the events in the Suggested Investigations special page to give us basic visibility into how users interact with the platform.

We want to test whether such a platform is useful and desirable to CheckUsers in their day-to-day work.

The KR/Hypothesis: https://app.asana.com/1/3758245663860/project/1210702563871278/overview/1210703340891827
Design doc: https://www.figma.com/design/yO1fSBWZyMXIrADcEvlcSO/Suggested-Investigations?node-id=5540-87&p=f&t=4sp7iKOgxlMtV59u-0
Measurement plan: https://docs.google.com/document/d/19Uvgc-542fsrV_-vFnKhMRUseoN0fMQnQ_mVFyy4zmo/edit?tab=t.0
Instrumentation spec: https://docs.google.com/spreadsheets/d/1B72MHVc4Wttu9RqfC5FB_hcJP0mycWtVxhQKsv9GusY/edit?gid=0#gid=0

Questions we want to answer

  • Quality of signals -> Are we generating valid investigations?
  • Tool usefulness -> Do users like the workflow?
  • General usage -> How do users use the tool ?

Implementation spec

User-Agent collection:
Per the instrumentation spec, the stream must be configured to opt out of collecting User-Agent information.

Contextual attributes:
The following contextual attributes need to be configured for the stream:

  • agent_client_platform_family
  • mediawiki_database
  • performer_id
  • performer_name
  • performer_pageview_id
  • performer_groups
  • performer_edit_count
  • performer_edit_count_bucket
  • performer_registration_dt

Metric: The one-week retention rate of visits to the Suggested Investigations page is X%

Event to be tracked:

  • User loads page

Interaction data

  • action: page_load
  • action_context: { is_paging_results: whether the user is on the first set of results, limit: value of the limit parameter }

Metric: Y% of CheckUsers who visit the Suggested Investigations page in a given week investigate and close at least one task

Event to be tracked:

  • User loads page
  • User closes a case

Interaction data

  • action: case_status_change
  • action_subtype: closed | invalid | open
  • action_context: { case_id: numeric_task_id, signal: signal_name, number_of_users: number_of_users, has_note: whether text exists }

Metric: Proportion of cases that are closed within 48 hours out of all cases opened during a week.

Event to be tracked:

  • A case is opened

Interaction data

  • action: case_open
  • action_subtype:
  • action_context: { case_id: numeric_task_id, signal: signal_name, number_of_users: number_of_users }

Nice to have

Event to be tracked:

  • A case is updated

Interaction data

  • action: case_updated
  • action_subtype:
  • action_context: { case_id: numeric_task_id, signal: signal_name, number_of_users: number_of_users }

Acceptance criteria

  • Morten w/ Madalina: Measurement plan defining metrics
  • Morten: Instrumentation specification
  • Engineering: Implementation of the specification
  • Engineering: Instrumentation QA (pre-deployment and post-deployment)
  • Morten: post-deployment data QA

Documentation

Measurement Plan

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@nettrom_WMF I have updated the task with the latest spec. Can you please check and confirm here that this is correct?

@nettrom_WMF I have updated the task with the latest spec. Can you please check and confirm here that this is correct?

The list of metrics and events looks good to me. I modified the task to also specify that we're opting out of User-Agent collection (in order to keep the risk low), and also list all the contextual attributes that the stream needs to configure.

kostajh updated the task description. (Show Details)
kostajh updated the task description. (Show Details)

Change #1197278 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[operations/mediawiki-config@master] Define CheckUser SuggestedInvestigations event stream

https://gerrit.wikimedia.org/r/1197278

Change #1197279 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Instrument the Suggested investigations feature

https://gerrit.wikimedia.org/r/1197279

Change #1197278 merged by jenkins-bot:

[operations/mediawiki-config@master] Define CheckUser Suggested Investigations event stream

https://gerrit.wikimedia.org/r/1197278

Mentioned in SAL (#wikimedia-operations) [2025-10-21T12:38:35Z] <dreamyjazz@deploy2002> Started scap sync-world: Backport for [[gerrit:1197278|Define CheckUser Suggested Investigations event stream (T404177)]], [[gerrit:1196914|CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-21T12:42:57Z] <dreamyjazz@deploy2002> dreamyjazz: Backport for [[gerrit:1197278|Define CheckUser Suggested Investigations event stream (T404177)]], [[gerrit:1196914|CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-21T12:50:17Z] <dreamyjazz@deploy2002> Finished scap sync-world: Backport for [[gerrit:1197278|Define CheckUser Suggested Investigations event stream (T404177)]], [[gerrit:1196914|CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012)]] (duration: 11m 42s)

Dreamy_Jazz set the point value for this task to 2.Oct 21 2025, 8:25 PM
Dreamy_Jazz changed the point value for this task from 2 to 1.

Change #1198021 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction

https://gerrit.wikimedia.org/r/1198021

Change #1198021 merged by jenkins-bot:

[operations/mediawiki-config@master] EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction

https://gerrit.wikimedia.org/r/1198021

Mentioned in SAL (#wikimedia-operations) [2025-10-22T11:16:12Z] <dreamyjazz@deploy2002> Started scap sync-world: Backport for [[gerrit:1198021|EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-22T11:20:24Z] <dreamyjazz@deploy2002> kharlan, dreamyjazz: Backport for [[gerrit:1198021|EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-22T11:25:00Z] <dreamyjazz@deploy2002> Finished scap sync-world: Backport for [[gerrit:1198021|EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177)]] (duration: 08m 48s)

Change #1197279 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Instrument the Suggested investigations feature

https://gerrit.wikimedia.org/r/1197279

Dreamy_Jazz removed a project: Patch-For-Review.
Dreamy_Jazz subscribed.

Need to wait for the instrumentation to be deployed, then we can check it works. At that point it's back over to Morten

Change #1198131 had a related patch set uploaded (by Kosta Harlan; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@wmf/1.45.0-wmf.24] Instrument the Suggested investigations feature

https://gerrit.wikimedia.org/r/1198131

Change #1198131 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@wmf/1.45.0-wmf.24] Instrument the Suggested investigations feature

https://gerrit.wikimedia.org/r/1198131

Mentioned in SAL (#wikimedia-operations) [2025-10-23T08:19:06Z] <kharlan@deploy2002> Started scap sync-world: Backport for [[gerrit:1198131|Instrument the Suggested investigations feature (T404177)]]

Change #1198276 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/CheckUser@master] SuggestedInvestigationsInstrumentationClient: Use deferred update

https://gerrit.wikimedia.org/r/1198276

Mentioned in SAL (#wikimedia-operations) [2025-10-23T08:23:11Z] <kharlan@deploy2002> kharlan: Backport for [[gerrit:1198131|Instrument the Suggested investigations feature (T404177)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-23T08:31:41Z] <kharlan@deploy2002> Finished scap sync-world: Backport for [[gerrit:1198131|Instrument the Suggested investigations feature (T404177)]] (duration: 12m 35s)

Change #1198276 abandoned by Kosta Harlan:

[mediawiki/extensions/CheckUser@master] SuggestedInvestigationsInstrumentationClient: Use deferred update

https://gerrit.wikimedia.org/r/1198276

kostajh moved this task from In Progress to Done on the CheckUser-SuggestedInvestigations board.
kostajh subscribed.

@nettrom_WMF, assigning to you to verify that the instrumentation data appears as expected. The patch will land in group2 wikis later today, so there should be data for review on Friday October 24.

I did some investigation into the data today, and found fourthree issues:

  1. I don't see any events for action = "case_status_change". Scratch that, you're already on it: T408546
  2. All events have mediawiki.database set to NULL. I checked the stream configuration and see that it's correctly set to capture mediawiki_database. Not sure if this is a bug in the PHP client library or something else?
  3. We're somehow capturing User-Agent data even though we're not supposed to be doing that. I again noticed that the stream configuration is set to opt out of that.
  4. performer_groups is missing from the set of contextual attributes in the stream configuration.

In short: there's a couple of odd things that I'm unsure who owns and how to fix, and one straightforward update to the stream configuration.

Change #1199762 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] product_metrics/suggested_investigations_interaction: add performer_groups

https://gerrit.wikimedia.org/r/1199762

Change #1199762 merged by jenkins-bot:

[operations/mediawiki-config@master] product_metrics/suggested_investigations_interaction: add performer_groups

https://gerrit.wikimedia.org/r/1199762

Mentioned in SAL (#wikimedia-operations) [2025-10-29T13:17:01Z] <kharlan@deploy2002> Started scap sync-world: Backport for [[gerrit:1199762|product_metrics/suggested_investigations_interaction: add performer_groups (T404177)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-29T13:19:25Z] <kharlan@deploy2002> kharlan: Backport for [[gerrit:1199762|product_metrics/suggested_investigations_interaction: add performer_groups (T404177)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-29T13:31:49Z] <kharlan@deploy2002> Finished scap sync-world: Backport for [[gerrit:1199762|product_metrics/suggested_investigations_interaction: add performer_groups (T404177)]] (duration: 14m 48s)

All events have mediawiki.database set to NULL. I checked the stream configuration and see that it's correctly set to capture mediawiki_database. Not sure if this is a bug in the PHP client library or something else?

This one is a bug in the PHP client library. We are already working on it via T408717: [PHP client library] Fill mediawiki_database contextual attribute

We're somehow capturing User-Agent data even though we're not supposed to be doing that. I again noticed that the stream configuration is set to opt out of that.

Regarding this one, it's something already identified here but still pending. We have filed a ticket to start taking a look at how to address a fix T408719: Stop adding user-agent details to http.request_header.user-agent directly via EventLogging

This one is a bug in the PHP client library. We are already working on it via T408717: [PHP client library] Fill mediawiki_database contextual attribute

This is already merged and backported so you should start seeing mediawiki_database as a populated attribute

This is already merged and backported so you should start seeing mediawiki_database as a populated attribute

Can confirm that event data in the Data Lake now contains mediawiki_database as expected. Thanks for resolving this so quickly, @Sfaci !