Page MenuHomePhabricator

Track two-day engagement with diffs in Review Changes
Closed, ResolvedPublic

Description

For WE1.11, we have the following KR:

By the end of Q4, X% of visitors to the Personal Dashboard click to review an edit on at least two separate days.

We would like this statistic added to the Personal Dashboard dashboard as a value that we can track over time (perhaps weekly?) so that we can follow the impact of our changes.

Related Objects

Event Timeline

jsn.sherman subscribed.

just noting for my fellow moderator tools engineers that this is for the personal dashboard superset dashboard, not the personal dashboard logstash dashboard. We have more than one dashboard dashboard. We may need to list these out in a dashboard dashboard dashboard.

@Samwalton9-WMF

Sharing some thoughts below on how to define the window of opportunity for a user to meet the at least two separate days engagement criteria. I'd recommend two approaches for the dashboard:

Cumulative Quarter to Date Progress Metric
This approach measures the overall success rate for the entire quarter or another specifically defined timeframe). It answers: "From the start of Q4 to today, what percentage of dashboard visitors have clicked to review an edit on at least two separate days?"

Since we'd like to track over time, I'd recommend using a rolling window approach. This is similar to how we currently define our two-week retention metric and will allow us to see how the rate improves as the quarter progresses. In this definition, we'd look at "Of all unique visitors from Day 1 to [Current Date], what % have clicked an edit on 2+ separate days?"

One problem with this approach is that if we deploy a new improvement later in the quarter, this number will barely move because it's weighted by the earlier weeks.

Rolling Window Metric (7-day or 28-day)
We could also add a metric to the dashboard that shows the rate over a specific window such as the last 28 days or 7 days. This allows us to see the impacts of product deployments more quickly.

This would be visualized as a line chart where each point represents the 2-day return rate for the window of the previous X days.

Timeframe Selection
The time window depends on how quickly we expect these types of users to return to view the dashboard and how quickly we need to monitor changes.

A 7-Day window would be more sensitive to changes (If we deploy something tomorrow, we might see that metric change the next day). However, this would mostly capture power users and may underreport more casual contributors who only edit on the weekends or a couple times a month.

Closer to a month (28 days) might be better at capturing more casual contributors and monthly or bi-weekly editing habits.

Thanks Megan, this is really helpful food for thought. I think the rolling window approach makes sense to me. Let's shoot for 7 days - I think we want to be ambitious here and make an experience that's compelling in that timeframe, and we can account for the baseline in our targets.

@Samwalton9-WMF

I've updated the Superset dashboard with charts to track this KR metric. These are currently located on a separate tab ("Two-day engagement with diffs [DRAFT]") to help avoid timeout issues.

Charts track the proportion of unique visitors to the Personal Dashboard who click to review an edit (element_friendly_name: Personal Dashboard diff link) on at least two distinct days within a 7-day window. Charts include:

  • Two-Day Engagement Rate (Rolling 7-day window): The percentage of unique visitors who visited the dashboard on a given day and successfully went on to click a review link on at least two distinct days within their 7-day window. Charts show the current rate and daily trends across all wikis.
  • Average Two-Day Engagement Rate (Rolling 7-day window): The overall, volume-weighted average of the engagement rate across the entire reviewed timeframe (as indicated by the time range filters). It takes a running tally of all daily dashboard visitors and divides it by the total number of those visitors who successfully interacted with a review diff on at least two distinct days within their specific 7-day window.

Based on the 7-day rolling window approach, engagement rates currently range from about 4 to 7% across all pilot wikis following an initial peak at the beginning of deployment. Trwiki currently has the highest average rate (15.6%) while thwiki has the lowest (3.1%).

Let me know if you have any questions or suggested changes.

Thanks @MNeisler!

Just to make sure I understand - in the Average graph is each data point effectively "From the start date up to this date, what is the average engagement rate"?

Trwiki currently has the highest average rate (15.6%) while thwiki has the lowest (3.1%).

This difference is really interesting to me, it's quite significant! Limiting to the recent window after Test Kitchen tracking was turned on again (from 28th April) the difference is even more strong - 16.6% to 3.0%.

Do you have any ideas about how we could interrogate that further? I'm curious if it's perhaps being driven by a few power users, or something else.

Just to make sure I understand - in the Average graph is each data point effectively "From the start date up to this date, what is the average engagement rate"?

Both the "Average two-day engagement rate" charts reflects the average engagement rate across all day 7-day rolling windows up to that point. It does not change the 7-day rolling window rule. In other words, we're not looking across the entire reviewed time range to see the proportion of visitors who clicked to review an edit on two distinct days. A visitor to the dashboard always only has exactly 7 days to click on two separate days to be counted as engaged using the current definition.

In the "Daily two-day engagement rate chart", each data point shows "Of the unique visitors who arrived on this specific day, what percentage successfully clicked to review an edit on at least two separate days over the next 7 days?".

The Daily chart is useful for monitoring immediate impacts from changes to the feature, while the "Average" charts are useful for tracking overall health trends since they smooth out a lot of noise caused by daily fluctuations.

This difference is really interesting to me, it's quite significant!

Yeah, I agree. Trwiki is definitely an outlier compared to the other wikis (It also currently has the highest retention rate). This engagement metric looks at distinct visitors to the dashboard so a single power user would still count as just one engaged user. However, this higher rate could be due to a much higher proportion of highly experienced and active editors at trwiki using this dashboard compared to other wikis.

One idea to investigate is to review the engagement rate by user experience. Feel free to file a ticket if it would be worthwhile to investigate.

Thanks, that's helpful clarification!

Another thing I'm curious about is that the rate decreases over time - from 6% at the start of the month to 2% now. That seems like quite a significant decrease, but we haven't really changed anything about the experience over that time. Just to double check, there's no reason that the more recent data would be lower due to how we're calculating this figure? i.e. this should be considered a real decrease?

Another thing I'm curious about is that the rate decreases over time

@Samwalton9-WMF Thanks for flagging this! I investigated and added some fixes to account for any potential data truncation issues; however, I'm still seeing a decrease starting around 5 May to essentially 0% on 13 May (the charts are set to only show up to 7 days ago to ensure that each daily cohort has a full 7 days to complete their activity window).

Looking more at the raw data, it appears this decrease is not real but due to a temporary instrumentation issue. We stopped logging clicks to the Personal Dashboard diff link from 8 May through 20 May. Starting on 21 May, we started logging these clicks at a normal rate again. See data below.

Interestingly, this instrumentation gap seems to only be impacting clicks to the Personal Dashboard diff link (pageviews and other clicks are not impacted). You mentioned you all hadn't changed anything recently but maybe there was some type of instrumentation change that caused this?

If you filter the missing days out, the engagement rate is around 6.5%. Now that we're logging clicks again, engagement rates should slowly start to return to normal.

Here's a sample of data and a query that can be run in Superset

DateDaily VisitorsClicks to Diff Link
2026-05-05375184
2026-05-06348159
2026-05-07391125
2026-05-083580
2026-05-093850
2026-05-103780
2026-05-114310
2026-05-124440
2026-05-133800
2026-05-143320
2026-05-153090
2026-05-162920
2026-05-172720
2026-05-183100
2026-05-192670
2026-05-203370
2026-05-21335130
2026-05-22294197
2026-05-23274168
SELECT 
  CAST(FROM_ISO8601_TIMESTAMP(meta.dt) AS DATE) AS event_date,
  SUM(CASE WHEN action = 'pageview' THEN 1 ELSE 0 END) AS raw_pageviews,
  SUM(CASE WHEN action = 'click' AND element_friendly_name = 'Personal Dashboard diff link' THEN 1 ELSE 0 END) AS raw_diff_clicks
FROM event.product_metrics_web_base
WHERE instrument_name = 'personal-dashboard-health-metrics'
  AND YEAR = 2026
  AND meta.dt >= '2026-05-01'
  AND meta.dt <= '2026-05-24'
GROUP BY 1
ORDER BY 1 ASC;

Hmm, I'm not sure what happened here. Possibly we lost some instrumentation code for a while.

Anyway, I think this all looks good - we can just update the dashboard to start from 21st May when we look at this tab, and the 6.5% baseline is very helpful!