Page MenuHomePhabricator

Instrumentation for Incident Reporting System
Closed, ResolvedPublic

Description

We want to set up an instrumentation that covers the events in the emergency and non-emergency reporting workflow to give us basic visibility into how users interact with the incident reporting system.

The KR: https://app.asana.com/0/1207718010906612/1207718375312157
Design doc: https://www.figma.com/design/ZfMGm4XFZy49GlYyINacaK/IRS-MVP-v1?node-id=2007-7869&t=M9Kwk9rpD4330H6S-0

Questions we want to answer

  • How many times users clicked on the report button

Emergency Flow

  • How often do users select the "emergency flow"?
  • How often do users select from the dropdown menu? And what type?
  • How often do users click "continue" and proceed to the submit step?
  • How often do users add additional information?
  • How many successful reports were made?
  • The submission rate of reports.
  • How often do users click “cancel” or "close"?
  • From which step do the users drop more often?

Non-Emergency Flow

  • How often do users select the "non-emergency flow"?
  • How often do users select from the types of incidents? And what type?
  • How often do users proceed to the described steps?
  • How often do users proceed to the get support steps?
  • What are the links users click in the support steps?
  • How often do users click the “cancel” or "Close" button?
  • From which step do the users drop more often?

Acceptance criteria

  • Connie w/ Madalina: Measurement plan defining metrics
  • Connie: Instrumentation specification
  • Connie: Submit L3SC request (per Data Collection Guidelines)
  • Engineer: Implementation of the specification
  • Engineer: Instrumentation QA (pre-deployment and post-deployment)
  • Engineer/EM: Document the instrument
  • Connie: post-deployment data QA

Document

L3SC request
Measurement Plan
Instrumentation Spec

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
cchen triaged this task as Medium priority.Aug 19 2024, 7:45 PM
cchen updated the task description. (Show Details)
cchen renamed this task from Instrumentation for Incident Reporting System Emergency Flow to Instrumentation for Incident Reporting System.Nov 9 2024, 1:21 AM
cchen moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
mszabo changed the task status from Open to In Progress.Nov 20 2024, 12:45 PM
mszabo claimed this task.

Change #1093385 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] Wire up initial instrumentation logic

https://gerrit.wikimedia.org/r/1093385

Change #1093389 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[operations/mediawiki-config@master] Configure instrument for the Incident Reporting System

https://gerrit.wikimedia.org/r/1093389

Change #1093882 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Disable instrumentation by default

https://gerrit.wikimedia.org/r/1093882

Change #1093385 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] Wire up initial instrumentation logic

https://gerrit.wikimedia.org/r/1093385

Change #1093882 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Disable instrumentation by default

https://gerrit.wikimedia.org/r/1093882

Change #1093914 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Support funnels in useInstrument

https://gerrit.wikimedia.org/r/1093914

Change #1093917 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Limit interaction data context length

https://gerrit.wikimedia.org/r/1093917

Change #1093938 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Instrument the emergency flow

https://gerrit.wikimedia.org/r/1093938

Change #1094425 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Instrument shared parts of both flows

https://gerrit.wikimedia.org/r/1094425

Change #1093914 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Support funnels in useInstrument

https://gerrit.wikimedia.org/r/1093914

Change #1093917 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Limit interaction data context length

https://gerrit.wikimedia.org/r/1093917

Change #1093938 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Instrument the emergency flow

https://gerrit.wikimedia.org/r/1093938

Change #1094425 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Instrument shared parts of both flows

https://gerrit.wikimedia.org/r/1094425

Change #1097335 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Update instrumentation stream name

https://gerrit.wikimedia.org/r/1097335

Change #1097335 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Update instrumentation stream name

https://gerrit.wikimedia.org/r/1097335

Change #1098170 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ReportIncident@master] dialog: Instrument the non-emergency flow

https://gerrit.wikimedia.org/r/1098170

Change #1098170 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] dialog: Instrument the non-emergency flow

https://gerrit.wikimedia.org/r/1098170

Change #1093389 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure instrument for the Incident Reporting System

https://gerrit.wikimedia.org/r/1093389

Mentioned in SAL (#wikimedia-operations) [2024-11-27T13:24:14Z] <mszabo@deploy2002> Started scap sync-world: Backport for [[gerrit:1098506|private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480|Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389|Configure instrument for the Incident Reporting System (T372823)]]

Mentioned in SAL (#wikimedia-operations) [2024-11-27T13:30:06Z] <mszabo@deploy2002> mszabo: Backport for [[gerrit:1098506|private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480|Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389|Configure instrument for the Incident Reporting System (T372823)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-11-27T13:38:08Z] <mszabo@deploy2002> Finished scap sync-world: Backport for [[gerrit:1098506|private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480|Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389|Configure instrument for the Incident Reporting System (T372823)]] (duration: 13m 53s)

Change #1098913 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[operations/mediawiki-config@master] ReportIncident: Enable instrumentation on labs

https://gerrit.wikimedia.org/r/1098913

Change #1098913 merged by jenkins-bot:

[operations/mediawiki-config@master] ReportIncident: Enable instrumentation on labs

https://gerrit.wikimedia.org/r/1098913

Mentioned in SAL (#wikimedia-operations) [2024-11-28T14:23:29Z] <urbanecm@deploy2002> Started scap sync-world: Backport for [[gerrit:1098623|Use useformat query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913|ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509|Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622|Use useformat query param for device detection or mobile domain (m.) (

Mentioned in SAL (#wikimedia-operations) [2024-11-28T14:28:41Z] <urbanecm@deploy2002> urbanecm, tgr, abi, mszabo: Backport for [[gerrit:1098623|Use useformat query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913|ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509|Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622|Use useformat query param for device detection or mobile domain (m.

Instrumentation is now enabled on the Beta Cluster.

Mentioned in SAL (#wikimedia-operations) [2024-11-28T14:54:04Z] <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1098623|Use useformat query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913|ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509|Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622|Use useformat query param for device detection or mobile domain (m.)

@mszabo:
How do I access the Instrumentation data on the Beta Cluster?
I did notice in the ticket description it has a link to some app called Asana which I am assuming is where the instrumentation data is for this ticket, if that is the case then I have not been given access to Asana, so is there another way to verify the specifics of this ticket?
If not then this ticket will have to bypass QA.

Hey @Djackson-ctr , I believe Connie will be checking the instrumentation on Beta today, so this ticket might indeed skip the regular QA flow. Sorry for not leaving a heads-up! I've reached out to Kosta to confirm whether we're good to do that and I'll get back to you.

Please see my comments here about how to print the event logging data to the browser console or to a popup in the browser: https://phabricator.wikimedia.org/T381122#10374617

^ what Kosta said and also https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Beta

@mszabo: Can you please confirm that the instrumentation works and the specification is implemented correctly? This part is usually done by QTE or the engineer who did the instrumentation and the goal is to catch as many bugs as possible before deploying to production (including testwiki). Things to look for:

  • Are the events firing when they're supposed to as you're interacting with instrumented UI elements?
  • Do those events have the correct event data they're supposed to? (action, etc.)
  • Is the event data valid? e.g. if schema specifies a field is a string but the instrument is sending a number, we want to catch that

This kind of pre-deployment instrument QA will greatly simplify and speed up @cchen's post-deployment data QA which will boil down to "are we seeing all the events we're supposed to see in the appropriate tables in the data lake?"

^ what Kosta said and also https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Beta

@mszabo: Can you please confirm that the instrumentation works and the specification is implemented correctly? This part is usually done by QTE or the engineer who did the instrumentation and the goal is to catch as many bugs as possible before deploying to production (including testwiki). Things to look for:

  • Are the events firing when they're supposed to as you're interacting with instrumented UI elements?
  • Do those events have the correct event data they're supposed to? (action, etc.)
  • Is the event data valid? e.g. if schema specifies a field is a string but the instrument is sending a number, we want to catch that

This kind of pre-deployment instrument QA will greatly simplify and speed up @cchen's post-deployment data QA which will boil down to "are we seeing all the events we're supposed to see in the appropriate tables in the data lake?"

Hey Mikhail—yes, this was part of my development process. I've been running MW configured to send instrumentation events to a local eventgate instance using the supplied dev config decorated with the following attributes:

stream_config_uri: "http://enwiki.wmf.home.arpa/w/api.php?format=json&action=streamconfigs&constraints=destination_event_service=eventgate-analytics-external"
stream_config_object_path: streams

while working on this task to enable destination stream config and per-stream JSON schema validation in addition to outputting captured interaction events.

Here are some sample events captured for each workflow, listed in the order of funnel_event_sequence and matched to the corresponding event in the instrumentation spec. The funnel_entry_token data has been omitted in this table for brevity.
The raw source files contain the exact events captured by EventGate.

Non-emergency flow
(Note: The first recorded event here is a server-side interaction event as specified by T380599)
Event nameInteraction data
Users open the report form"action": "view", "action_source": "form",
Number of clicks on the “emergency flow”/ "non-emergency flow" option"action": "click", "action_source": "form","action_context": "non-emergency","funnel_name": "non-emergency",
Number of clicks on the “Continue” in "Form" step"action": "click","action_subtype": "continue","action_source": "form","action_context": "{\"harm_option\":\"na\"}","funnel_name": "non-emergency",
Pageviews to the “Describe the incident” step"action": "view","action_source": "describe_unacceptable_behavior","funnel_name": "non-emergency",
Number of clicks on the types of unacceptable behaviors"action": "click","action_source": "describe_unacceptable_behavior","action_context": "hate-or-discrimination","funnel_name": "non-emergency"
Number of clicks on the “Continue” in "Describe the incident" step"action": "click","action_subtype": "continue","action_source": "describe_unacceptable_behavior","action_context": "hate-or-discrimination"
Pageviews to the “Get Support” stepaction": "view","action_source": "get_support","funnel_name": "non-emergency",
Page names of clicks in the support step"action": "click","action_source": "get_support","action_context": "/wiki/Foundation:Special:MyLanguage/Policy:Universal_Code_of_Con","funnel_name": "non-emergency",
Emergency flow
Event nameInteraction data
Users open the report form"action": "view", "action_source": "form",
Number of clicks on the “emergency flow”/ "non-emergency flow" option"action": "click", "action_source": "form","action_context": "emergency","funnel_name": "emergency",
Number of clicks on the dropdown menu options"action": "click","action_source": "form","action_context": "public","funnel_name": "emergency",
Number of clicks on the “Continue” in "Form" stepaction": "click","action_subtype": "continue","action_source": "form","action_context": "{\"harm_option\":\"public\"}","funnel_name": "emergency",
Pageviews to the “Submit Report” step"action": "view","action_source": "submit_report","funnel_name": "emergency",
Number of times users enter text in “Additional Information”"action": "click","action_subtype": "continue","action_source": "submit_report","action_context": "{\"addl_info\":true,\"reported_user\":\"~2024-19\"}","funnel_name": "emergency",
Pageviews to the “Submitted” step"action": "view","action_source": "submitted","funnel_name": "emergency",

Finally, here are the events captured from an incomplete flow where the user switched back to the first screen and changed the report type before canceling the workflow altogether:

Incomplete flow
Event nameInteraction data
Users open the report form"action": "view", "action_source": "form",
Number of clicks on the “emergency flow”/ "non-emergency flow" option"action": "click", "action_source": "form","action_context": "non-emergency","funnel_name": "non-emergency",
Number of clicks on the “Continue” in "Form" step"action": "click","action_subtype": "continue","action_source": "form","action_context": "{\"harm_option\":\"na\"}","funnel_name": "non-emergency",
Pageviews to the “Describe the incident” step"action": "view","action_source": "describe_unacceptable_behavior","funnel_name": "non-emergency",
Number of clicks on the “emergency flow”/ "non-emergency flow" option"action": "click", "action_source": "form","action_context": "emergency","funnel_name": "emergency",
Number of clicks on “cancel” or “close” (also by steps)"action": "click","action_subtype": "close","action_source": "form","funnel_name": "emergency",

Since the recorded events appear to be in line with our instrumentation spec and EventGate reported no schema or stream configuration validation issues, I think the instrumentation looks overall good (the same stream configuration was adapted for Beta and production). Let me know if you have any questions!