Page MenuHomePhabricator

MinT for Readers instrumentation: Implement session length instrument
Closed, ResolvedPublic4 Estimated Story Points

Description

As part of the experiment, it is important for us to measure the dwell time of users reading the automatic translation. This will be necessary to gauge user interest, and also remove any potential noise.

The first step of the task is to decide on a viable approach for this instrumentation.

Approach 1: Using the session ticks instrumentation

Consider if the existing session ticks instrumentation can be used.

Each active user session sends its current length at a pre-determined time interval called a 'tick'. When a user session becomes inactive, it stops sending length updates. If it remains inactive long enough, it will be reset.

Based on the documentation, the tick duration is currently 1 min. However, this is too wide for our use case. We need to check if this can brought down to a few seconds. Additionally, the documentation also mentions.

It is likely that the instrumentation for this metric sends a big amount of events. Sending events every N seconds might have a strong effect of battery drain in mobile (radio gets woken up if phone is on idle and that might be costly). We should take this into account and try to reduce the network usage, i.e. by queuing events and sending them in batches.

Approach 2: Page load and unload event

Similar to web_ui_reading_depth, we have to log page load and unload event.

  • Page load: user clicks to view the automatic translation and the translation is loaded.
  • Page unload: user exits the automatic translation (in all possible cases).

The difference between client side timestamps can be used to estimate the session length. This might be easier to implement, however, we will have to be sure about data reliability.

Event Timeline

PWaigi-WMF raised the priority of this task from Medium to High.Mar 3 2025, 11:36 AM
Nikerabbit subscribed.

@KCVelaga_WMF Is this ready for dev? Who needs to do what to define which option to choose?

I think the first approach would provide more reliable data because we want to calculate the duration that the user is actually interacting with the content. We could implement something on our own, but the same event will get triggered multiple times:

Rough example:

// Duration will change based on the time that the user has been active on the tab.
// When the user changes the active tab, we pause the duration and start it again if the our tab gains focus for X seconds or something like that.
logEvent( 'session_activity', { title: 'pagename', duration } );

The event can be logged every 30 seconds for example. If for some reason the browser is closed and our beforeunload or visibilitychange event doesn't trigger...we will only be off by 30 seconds.

The second approach is easier to implement and we should use the visibilitychange event instead of the beforeunload (see link for why). When using the visibilitychange event there are two cases to be aware of:

If the user comes back to our tab, should that be a new session?

  1. If yes, it will inflate the session counts.
  2. If no, we might submit the session_activity event multiple times and only the last one should be considered.

I think the second approach should be fairly accurate. We can start with that and change later based on our observation.

Also what happens if the user changes the target language etc? Should it start a new session?

abi_ set the point value for this task to 4.

We've decide to use the visibilitychange event to determine whether the user is active. An event will be generated whenever that visibilitychange event triggers. The view event will be used to determine when the user session started.

Also what happens if the user changes the target language etc? Should it start a new session?

This will trigger the view event and we can decide whether we want to treat this as a new session or not.

@abi_ Here is my proposal for the events, let me know what you think.

Interaction data

action: visibility_change
action_subtype:
 hidden_state
 visible_state

Note:

  • If the browser tab is closed completely, then it is session end event rather than visibility change.
  • If there is no activity for 30 min, then we can consider it as session end.

Translation object (at the time of event being triggered)

  • target_title: title of the page
  • source_language: source language of translation
  • target_language: target language of translation

We don't need to capture the duration, as I can easily calculate from timestamps.

Change #1126930 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] AX Instrumentation: Track session length

https://gerrit.wikimedia.org/r/1126930

Change #1126930 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] AX Instrumentation: Track session length

https://gerrit.wikimedia.org/r/1126930

abi_ changed the task status from Open to In Progress.Mar 18 2025, 8:46 AM

@abi_ Here is my proposal for the events, let me know what you think.

Interaction data

action: visibility_change
action_subtype:
 hidden_state
 visible_state

Note:

  • If the browser tab is closed completely, then it is session end event rather than visibility change.
  • If there is no activity for 30 min, then we can consider it as session end.

Translation object (at the time of event being triggered)

  • target_title: title of the page
  • source_language: source language of translation
  • target_language: target language of translation

We don't need to capture the duration, as I can easily calculate from timestamps.

This functionality has been implemented and deployed on https://language-cx.wmcloud.org/index.php/Special:AutomaticTranslation

To test, enable display of event logs in the browser:

mw.loader.using('mediawiki.api')
    .then(
        () => new mw.Api().saveOption('eventlogging-display-web', '1')
    );

Visibility changes events, hidden and visible are being logged as expected.

image.png (584×526 px, 126 KB)