Page MenuHomePhabricator

WE1.4.3: Instrument watchlist
Open, Needs TriagePublic

Description

Per WE1.4.3,

If we instrument [the] watchlist, then we can define a baseline for how often people click to pages.


Scoped / Updated Request:

A dashboard showing CTR for diff & CTR for page using the previous experiment definition for CTR (see the measurement plan for specifics). Additionally, ensure that events are instrumented such that the previous default contextual attributes carry over and these additional attributes are also included:

  • performer_edit_count
  • performer_edit_count_bucket
  • performer_registration_dt

Acceptance Criteria

Product Analytics:

  • Discuss needs with the team
  • Define events/fields, metric logic, data retention
  • Create Measurement Plan & Instrumentation Spec
  • Discuss plans, questions, concerns with the team and proceed when docs receive sign-off
  • Sign off on table structure used by Superset, acceptance criteria
  • QA/validation
  • Build Interim Superset datasets/charts/dashboard
  • Finalize Superset dashboard
  • Document Instrument

Engineering:

Initial Request:

Both the Watchlist and Recent Changes should be instrumented in a way which tracks the click through rate of:

  • CTR Diff link clicks
  • CTR History link clicks
  • CTR Article/Page link clicks
  • CTR Username link (user page) clicks
  • CTR Username link (user talk page) clicks
  • CTR Username link (user contributions page) clicks
  • CTR Tag link clicks
  • CTR Action link (rollback) clicks
  • CTR Action link (thank) clicks

The following info about the user which clicked should be tracked:

  • Edit count (bucket)
  • Account age (bucket)

Output:

  • Article/Page link clicks: When a user clicks the target article/page of a watchlist/recent change line, track the click along with information about the user's edit count and account age
  • Diff link clicks: When a user clicks the edit diff link of a watchlist/recent change line, track the click along with information about the user's edit count and account age

Event Timeline

@TheresNoTime this looks good. Ideally in the target article/page we are also able to track any of the links on the row. That way, we can identify not only the CTR, but the CTR on certain elements of the page.

Is this doable? I think the spec as defined only tracks the diff or article links.

@TheresNoTime this looks good. Ideally in the target article/page we are also able to track any of the links on the row. That way, we can identify not only the CTR, but the CTR on certain elements of the page.

Is this doable? I think the spec as defined only tracks the diff or article links.

Yep doable - see below for clarity:

image.png (494×1 px, 176 KB)

[summary: each red boxed link on any row in the watchlist/recent changes which is not a wikidata change row]

Change #1190230 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/EventLogging@master] ext.eventLogging: Add 'immediate' event to submitEvent

https://gerrit.wikimedia.org/r/1190230

Change #1190233 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics

https://gerrit.wikimedia.org/r/1190233

TheresNoTime renamed this task from Instrument watchlist and recent changes to WE1.4.3: Instrument watchlist.Sep 22 2025, 2:38 PM
TheresNoTime updated the task description. (Show Details)

Change #1190230 abandoned by Samtar:

[mediawiki/extensions/EventLogging@master] ext.eventLogging: Add 'immediate' event to submitEvent

https://gerrit.wikimedia.org/r/1190230

Change #1192861 had a related patch set uploaded (by Samtar; author: Samtar):

[operations/mediawiki-config@master] EventStreamConfig and stream registration for watchlist click tracking

https://gerrit.wikimedia.org/r/1192861

Change #1192861 merged by jenkins-bot:

[operations/mediawiki-config@master] EventStreamConfig and stream registration for watchlist click tracking

https://gerrit.wikimedia.org/r/1192861

Mentioned in SAL (#wikimedia-operations) [2025-10-02T19:38:24Z] <samtar@deploy2002> Started scap sync-world: Backport for [[gerrit:1192861|EventStreamConfig and stream registration for watchlist click tracking (T401575)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-02T19:44:28Z] <samtar@deploy2002> samtar: Backport for [[gerrit:1192861|EventStreamConfig and stream registration for watchlist click tracking (T401575)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-02T19:49:10Z] <samtar@deploy2002> Finished scap sync-world: Backport for [[gerrit:1192861|EventStreamConfig and stream registration for watchlist click tracking (T401575)]] (duration: 10m 46s)

Change #1190233 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics

https://gerrit.wikimedia.org/r/1190233

Change #1193213 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.21] ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics

https://gerrit.wikimedia.org/r/1193213

Change #1193213 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.21] ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics

https://gerrit.wikimedia.org/r/1193213

Mentioned in SAL (#wikimedia-operations) [2025-10-02T21:04:31Z] <samtar@deploy2002> Started scap sync-world: Backport for [[gerrit:1193213|ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-02T21:08:28Z] <samtar@deploy2002> samtar: Backport for [[gerrit:1193213|ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-02T21:17:06Z] <samtar@deploy2002> Finished scap sync-world: Backport for [[gerrit:1193213|ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)]] (duration: 12m 35s)

Just moving a couple of comments from r1190233 to here, so they're not forgotten for possible follow-up:

  • Link click handlers are not added to new watchlist rows being added when 'live updates' are enabled or the 'view new changes' button is clicked.
  • Some link types are not being captured, such as a.mw-changeslist-diff-cur and a.mw-thanks-thank-link (and possibly others).

Change #1193815 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Add page-visited

https://gerrit.wikimedia.org/r/1193815

Change #1193815 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Add page-visited

https://gerrit.wikimedia.org/r/1193815

Change #1193923 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.21] ext.wikimediaEvents.WatchlistBaseline: Add page-visited

https://gerrit.wikimedia.org/r/1193923

Change #1193923 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.21] ext.wikimediaEvents.WatchlistBaseline: Add page-visited

https://gerrit.wikimedia.org/r/1193923

Mentioned in SAL (#wikimedia-operations) [2025-10-06T19:56:26Z] <samtar@deploy2002> Started scap sync-world: Backport for [[gerrit:1193923|ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-06T20:01:02Z] <samtar@deploy2002> samtar: Backport for [[gerrit:1193923|ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-06T20:10:14Z] <samtar@deploy2002> Finished scap sync-world: Backport for [[gerrit:1193923|ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)]] (duration: 14m 13s)

@TheresNoTime We need action_source and instrument_name info for both page_visited and click events.

Right now data is inconsistently generated due to missing parameters.
Click events are sent with action_source and instrument_name info but that's not the case for page_visited.

We can see page-visited events when action_source and instrument_name are dropped from the query:

SELECT
  CAST(FROM_ISO8601_TIMESTAMP(meta.dt) AS DATE) AS dt,
  action,
  COUNT(*) AS event
FROM mediawiki_product_metrics_watchlistclicktracker
WHERE FROM_ISO8601_TIMESTAMP(meta.dt) >= (current_timestamp - INTERVAL '1' DAY)
GROUP BY 1, 2
ORDER BY dt DESC

Change #1194651 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Send source/instrument

https://gerrit.wikimedia.org/r/1194651

Change #1194651 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] ext.wikimediaEvents.WatchlistBaseline: Send source/instrument

https://gerrit.wikimedia.org/r/1194651

Change #1196061 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.22] ext.wikimediaEvents.WatchlistBaseline: Send source/instrument

https://gerrit.wikimedia.org/r/1196061

Change #1196061 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.22] ext.wikimediaEvents.WatchlistBaseline: Send source/instrument

https://gerrit.wikimedia.org/r/1196061

Mentioned in SAL (#wikimedia-operations) [2025-10-14T14:02:20Z] <samtar@deploy2002> Started scap sync-world: Backport for [[gerrit:1196061|ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-14T14:06:34Z] <samtar@deploy2002> samtar: Backport for [[gerrit:1196061|ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-14T14:11:46Z] <samtar@deploy2002> Finished scap sync-world: Backport for [[gerrit:1196061|ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)]] (duration: 09m 25s)

A "Post-merge build succeeded" update was posted on Oct 10 4:21 AM, that's the latest update on the patch.
The patch went out on the train yesterday.
I'm not yet seeing page-visited events when I query.

Looking at the Checks tab for the change, I don't see any errors or warnings nor anything pending or failed.
Is this working or moving forward as expected and can we see data this week?

cc @Samwilson, recent reviewer, thank you!

That patch is included in wmf/1.45.0-wmf.22 and wmf/1.45.0-wmf.23 which are what's current on all sites, so it doesn't look like there's an issue there.

When I visit my watchlist on MediaWiki.org, I'm seeing the following be sent to https://intake-analytics.wikimedia.org/v1/events?hasty=true (without clicking any links):

{
	"$schema": "/analytics/product_metrics/web/base/1.4.3",
	"action": "page-visited",
	"action_source": "Watchlist",
	"agent": {
		"client_platform": "mediawiki_js",
		"client_platform_family": "desktop_browser"
	},
	"dt": "2025-10-15T23:29:36.865Z",
	"funnel_event_sequence_position": 1,
	"instrument_name": "WatchlistClickTracker",
	"mediawiki": {
		"database": "mediawikiwiki"
	},
	"meta": {
		"domain": "www.mediawiki.org",
		"stream": "mediawiki.product_metrics.WatchlistClickTracker"
	},
	"performer": {
		"name": "Samwilson",
		"pageview_id": "…"
	},
	"sample": {
		"rate": 1,
		"unit": "pageview"
	}
}

a) Thank you @Samwilson for confirming the process had completed.

b) Review:
page-visited action events are coming through with action_source = NULL and action_context = NULL.
So the first part of the CTR query's WHERE clause is unsuccessful:

WHERE action_source = 'Watchlist'
    AND (
        (action = 'page-visited' AND json_extract_scalar(action_context, '$.hc') = 'y') --- hc is for has changes (not all visits will have changes)
        OR 
        (action = 'click'     AND json_extract_scalar(action_context, '$.link') = 'articleLink') -- we'll also track 'diffLink'

Right now data is inconsistently generated.
Click action events are sent with action_source and instrument_name and action_context info but page_visited action events are only showing instrument_name info. Data QA query:

SELECT
  CAST(FROM_ISO8601_TIMESTAMP(meta.dt) AS DATE) AS dt,
  action, action_context, action_source, instrument_name,
  COUNT(*) AS event
FROM mediawiki_product_metrics_watchlistclicktracker
WHERE FROM_ISO8601_TIMESTAMP(dt) >= (current_timestamp - INTERVAL '1' DAY)
  AND instrument_name = 'WatchlistClickTracker'
  --AND action_context IN ('articleLink','diffLink')
  --AND action_source = 'Watchlist'
GROUP BY 1,2,3,4,5
ORDER BY dt DESC

The most recent patch was intended to add action_source

I assume page-visited doesn't need a context, but that's weird that action_source is null. It looks like it's not always null though:

SELECT dt, action, action_context, action_source from mediawiki_product_metrics_watchlistclicktracker order by dt desc limit 15
dt				action		action_context		action_source
2082-04-05T22:35:24.924Z	page-visited		
2025-10-16T16:32:40.515Z	page-visited				Watchlist
2025-10-16T16:21:14.277Z	page-visited				Watchlist
2025-10-16T15:09:11.649Z	page-visited				Watchlist
2025-10-16T06:02:40.753Z	click		userContribsLink	Watchlist
2025-10-16T05:38:07.487Z	page-visited				Watchlist
2025-10-16T03:03:35.999Z	click		diffLink		Watchlist
2025-10-16T03:01:17.521Z	page-visited		
2025-10-16T02:55:42.751Z	page-visited		
2025-10-16T02:39:39.969Z	click		userTalkLink		Watchlist
2025-10-16T02:39:36.131Z	page-visited		
2025-10-16T02:37:12.633Z	page-visited		
2025-10-16T02:33:48.030Z	page-visited		
2025-10-16T02:12:27.992Z	page-visited		
2025-10-16T02:10:54.784Z	page-visited

(What's up with those dates and times in the future? I guess these are coming direct from the client.)

@Samwilson: yeah the client-side timestamp (dt) is effectively useless, instead use the server-side timestamp (meta.dt)

SELECT action_source, COUNT(1) AS event_count
FROM event.mediawiki_product_metrics_watchlistclicktracker 
WHERE year = 2025 AND month = 10 AND day >= 16
  AND action = 'page-visited'
GROUP BY action_source
action_sourceevent_count
null<25
Watchlist23484

The action_source-less page visit events are in very low counts per hour (1-3) so maybe there's some outdated resource loader and/or browser caches of the previous code. I would expect the counts to keep decreasing until there are none.

I assume page-visited doesn't need a context

Looks like this was missed in the implementation of the instrumentation spec: page-visited events do need action_context because it's how we determine whether there was actually something for user to click on.

Without this information we can't filter, so any visit to an empty Watchlist deflates diff link CTR (for example).

Update: Until that's fixed, I recommend modifying the query and just noting in the dashboard that the metric may be slightly deflated because we are not excluding Watchlist visits where the user was not shown any changes and thus had nothing to click on. In practice that will likely be a very small number of users, hence the "slightly" – but still worth calling out as a caveat.

I don't have access to that doc.

What should the value of action_context be?

(Sammy is out of office at the moment, so I'm sticking my oar in here to try to help. I might be lacking too much background info though!)

Change #1196676 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/WikimediaEvents@master] WatchlistBaseline: Add action_context for page-visited event

https://gerrit.wikimedia.org/r/1196676

@Samwilson: you should be able to see it now

action_context should be a JSON blob like this (for example):

{
  "hc": "y",
  "sip": "n",
  "b": "n",
  "rb": "n"
}
NOTE: The action_context example is pretty-printed here for easier reference but the actual JSON string would be condensed (no spaces/newlines)

We want to be able to exclude page visits where has_changes = 'no' because if there's nothing for user to click on, then we can't calculate click-through

Due to action_context's 64 character limit, we need to condense the information we are encoding:

  • hc is for has changes (not all visits will have changes)
  • sip is for show IP (not all users see "show IP" links)
  • b is for block (not all users see "block" links)
  • rb is for rollback (not all users see "rollback" links)

Each of these can be "y" (on/yes) or "n" (off/no).

See the interim dashboard here. I'll be sure to update this next week per the above.

Given the existing instrumentation, right now I'm using regEx to extract action_context info and get CTR data for only diffLink (and actionLink). Example for diffLink:
REGEXP_LIKE(action_context, '(?i)(^|[^A-Za-z0-9_])diffLink($|[^A-Za-z0-9_])' )).