Page MenuHomePhabricator

Implement instrumentation with statsv for Phonos
Closed, ResolvedPublic2 Estimated Story Points

Description

CONTEXT

This ticket fleshes out the set of data-driven questions that can help us to understand both how to:

  • Measure the impact of generating pronunciation audio for readers
  • Optimizing the tool for future use

DATA QUESTIONS

Product - BEFORE LAUNCH of PHONOS
  • How often are users trying to listen to pronunciations by clicking on IPA syntax?
Product - AFTER LAUNCH of PHONOS
  • How often are users trying to listen to rendered pronunciations?
  • How often are users trying to listen to rendered pronunciations but failing to hear anything?
  • What's the average load time for audio?
Engineering

(Currently no engineering events to log)

New Instrumentation

Proposed new event nameAction that will trigger new eventSchema where event will be loggedProperties being tracked
timing.MediaWiki.extension.Phonos.IPA.can_play_throughTriggered when an audio finishes playing successfully.statsvThis action will track the time an audio takes to first load and finish playing by_lang and by_wiki
counter.MediaWiki.extension.Phonos.IPA.clickTriggered when user clicks on Phonos generated content (speaker icon, IPA).statsvThis action will track the count of clicks on Phono audio files per by_lang and by_wiki
counter.MediaWiki.extension.Phonos.IPA.errorTriggered when Phonos fails to play an audio.statsvThis event is going to count the times Phonos fails to play an audio by_lang and by_wiki
counter.MediaWiki.extension.Phonos.IPA.replayTriggered when Phonos is clicked for a replay.statsvThis event will track the clicks that trigger a replay by_lang and by_wiki

Note: session-id is suggested as a property since it may be useful

STRETCH GOAL

If possible, it would also be great to set up a Quick Survey to gauge reader/listener reactions.
Since the data-tracking events will only give us an insight into how and if people are clicking on the audio files to listen them, it will be hard to infer if they found the pronunciations useful.
The goal of the Quick Survey would be to collect data on how useful readers found the audio.
The Quick Survey should trigger after the entire audio file plays. The survey should only be one question:

  • How helpful was this audio in helping you understand how to pronounce this term?
  • Scale of 1-5 (1 Not Helpful, 5 Helpful)

Event Timeline

HMonroy removed HMonroy as the assignee of this task.
HMonroy subscribed.

Hi @mpopov! We need help with adding instrumentation to our Phonos project. This is a new extension that we are in the process of releasing. I believe we would need to do event logging in a Visual Editor feature, but not sure which feature name. Should we book a meeting with a data analytics team member?

Hello! o/ Very excited for Phonos and really glad y'all are talking about analytics for it now.

Unlike real-time previews I don't think it would make sense to plug this into VE Feature Use, since this is for the reading experience.

My recommendation would be to check in with @EChetty & @WDoranWMF about using the Metrics Platform for this. The pitch is that you write an instrument that logs just the custom data (e.g. which engine, load time for audio) and then Metrics Platform can take care of much of contextual info such as whether the user was logged in or which page they were on (see https://wikitech.wikimedia.org/wiki/Metrics_Platform/Event_Schema for more properties that Metrics Platform can fill in for you), and which properties you request is configured separately and doesn't require any changes to the instrumentation or redeploying the extension.

But yes, if you would like to consult somebody in Product Analytics you're welcome to: https://www.mediawiki.org/wiki/Product_Analytics/Consultation_Hours

@HMonroy Just want to point out that WDoran just went on leave for two months, just so you are not worried if you don't hear back.

@phuedx @EChetty Would we need a new schema for this extension? Or is there a current schema that we would be logging the events to?

The following is a summary of what @HMonroy, @TheresNoTime, and I spoke about yesterday:

There are two ways that this could be instrumented:

  1. Using statsv
  2. Using the Metrics Platform

Using statsv

statsv allows you to stand up instruments very quickly. However, the data that you can capture is limited to a metric name coupled with a count or a timing. So, for example, you can answer questions like "How many times has this UX element been clicked?" or "How long did that request take?".

You can partition your data broadly by designing your metric names carefully. For example, if you wanted to answer "How many times has this UX element been clicked per wiki?", then you would need to include the wiki name in the metric name, i.e.

const dbName = mw.config.get( 'wgDBname' );
const metrics = [
  'MediaWiki.extension.Phonos.IPA.click',
  `MediaWiki.extension.Phonos.IPA.click_by_wiki.${dbName}`
];

The data are stored in Prometheus and can power Grafana dashboards. As a frinstance, the Page Previews dashboard is powered by metrics collected via statsv.

Β Testing statsv

At the moment, production is the only environment where you can test your statsv-based instrument end-to-end. Previously, I have written QA instructions along the lines of

Observe that an HTTP request is made to /beacon/statsv?MediaWiki.extensions.Phonos.IPA.click=1c

Edit: https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/EventLogging is a thorough guide on how to get a production-like Event Platform/Legacy EventLogging testing environment set up locally. This environment includes the WikimediaEvents extension, which provides the statsv protocol handler.

Β Metrics Platform/Event Platform

The Event Platform allows you to stand up instruments to capture rich data to answer equally rich questions. For example, you can answer a question like "How frequently are pronunciations listened to completely?", "How frequently does a user listen to the pronunciation and then navigate to the corresponding File page?", and "How many unique users (devices) listen to a pronunciation?"

The Metrics Platform is an opinionated Event Platform client.

Firstly, the Metrics Platform owns and maintains the schema with which your events will be validated – we call it the monoschema – so you do not have to create a new schema for each new instrument. The monoschema has properties for the most common/instrument-agnostic data that teams might need to answer their questions, e.g. session ID, pageview ID, the namespace and title of the current page, and it also has a property that can hold instrument-specific data.

Secondly, the Metrics Platform works with event names and data rather than streams and events. That is, rather than writing an instrument that submits events to a specific stream, you write an instrument that dispatches events to zero or more interested streams.

Now, because the Metrics Platform is an opinionated Event Platform client, you still have to define a stream and configure it to be interested in the events that your instrument is dispatching. See https://wikitech.wikimedia.org/wiki/User:Phuedx/Metrics_Platform/Getting_Started/Creating_An_Instrument and https://wikitech.wikimedia.org/wiki/User:Phuedx/Metrics_Platform/Creating_a_Stream_Configuration for examples. Once you have created a stream, then you can start dispatching events:

mw.eventLog.dispatch( 'web.ui.ipa.click' );

const startedAt = mw.now();

// Elsewhere…
const finishedAt = mw.now() - startedAt;

mw.eventLog.dispatch( 'web.ui.ipa.play', { time_to_load: finishedAt } );

Testing Metrics Platform/Event Platform

We did not have time to talk about testing your Metrics Platform/Event Platform based instrumentation.

https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/EventLogging is a thorough guide on how to get a production-like Event Platform/Legacy EventLogging testing environment set up locally. After following the guide you will be able to submit events to a containerised EventGate instance that can load schemas from a local path.

HMonroy renamed this task from Add EventLogging to Phonos to Implement instrumentation with statsv for Phonos.Oct 12 2022, 10:36 PM

@NRodriguez do we want to answer the questions below per wiki? Per language?

How often are users trying to listen to rendered pronunciations?
How often are users trying to listen to rendered pronunciations but failing to hear anything?
What's the average load time for audio?

below per wiki? Per language?

Can it be by both?
Aka, could I look at the difference at how it's performing in Spanish Wiktionary versus Spanish Wikipedia?

Change 844067 had a related patch set uploaded (by HMonroy; author: HMonroy):

[mediawiki/extensions/Phonos@master] Add instrumentation with statsv

https://gerrit.wikimedia.org/r/844067

Manually triggering some events on the Beta Cluster (which, for reference, copies production DBnames, so en.wikipedia.beta is also enwiki) and checking in the labs instance of graphite:

image.png (748Γ—983 px, 96 KB)

Looks good! When in production, we'll be able to use the much nicer https://thanos.wikimedia.org to confirm metric logging

Change 844067 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Add instrumentation with statsv

https://gerrit.wikimedia.org/r/844067

Change 844520 had a related patch set uploaded (by HMonroy; author: HMonroy):

[mediawiki/extensions/Phonos@master] Clean statsv tracking

https://gerrit.wikimedia.org/r/844520

We set the code to track:

Timing - time it took a media to play through the end (only first time it was played after page was loaded)
Count - clicks, errors and replays

@NRodriguez please let me know if we should also time replays. I was thinking that only the first play and not replays.

@NRodriguez please let me know if we should also time replays. I was thinking that only the first play and not replays.

Let's track replays, I think it would help us understand if people are using the feature to understand how to pronounce things repeatedly

@NRodriguez we are currently counting the replays clicks, but we are not tracking the time the replay takes. We are tracking the time that it takes for an audio to load and play for the first time. Do you think we should also track the time of the replays? Thank you!

Change 844520 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Clean statsv tracking

https://gerrit.wikimedia.org/r/844520

HMonroy updated the task description. (Show Details)

@HMonroy I clicked on 5 IPA words with 5 times each. I also added an extra partial one before it was completed and in total, it recorded 5 statsv as been in the screenshots since they group them. I also checked the Dashboard and registered all 26 of my clicks.

I did test an error by doing a made-up language code, which did record a statsv as seen in the error screenshot. Is that error supposed to also show under Phonos error by Wiki, on the Dashboard? It did not register the error if it was supposed to.

Test Site: https://en.wikipedia.beta.wmflabs.org/wiki/Phonos
Dashboard Site: https://grafana-labs.wikimedia.org/d/wiQMOQI4k/phonos-stats-beta?orgId=1&refresh=10s&from=now-3h&to=now
Browser: Chrome

Registered Clicks

T315091_Phonos_Statsv_Chrome.png (1Γ—1 px, 388 KB)

T315091_Phonos_Statsv_Dashboard_Chrome.png (900Γ—2 px, 139 KB)

Test: Error

T315091_Phonos_Statsv_TestError_Chrome.png (824Γ—2 px, 254 KB)

T315091_Phonos_Statsv_TestError_Dashboard_Chrome.png (1Γ—1 px, 110 KB)

@GMikesell-WMF We needed to modify the panels in grafana so that they correctly pull and display the data. https://grafana-labs.wikimedia.org/d/wiQMOQI4k/phonos-stats-beta?from=1668661491446&orgId=1&to=1669740325660 you should be able to see your errors now :)

@HMonroy Got it! I see the errors now. I will move this to Product Sign-off. Thanks!