Page MenuHomePhabricator

Investigate unexpectedly low CTR totals in Logged-Out Warning Message experiment
Closed, ResolvedPublic

Description

Description:

CTR totals (~10%) are significantly lower than expected, raising concerns about data reliability. While totals below 100% are expected, current values suggest potential issues with instrumentation, experiment setup, or missing data.

NOTE: this is Early data, to illustrate the low CTRs. These numbers should NOT be used as a final interpretation of this feature's CTR!
image.png (1,120×688 px, 143 KB)
Scope:
  • Validate CTR calculation logic
  • Review instrumentation and event logging
  • Check experiment configuration and cohort assignment
  • Identify any data gaps or drop-offs in the funnel
Goal:

Determine whether the data is trustworthy and identify any root causes for the low CTR totals.

Event Timeline

After an audit discussion with @MNeisler around the current Account creation rate metric possible problems, we identified a potential pitfall being how the query is currently written. As it is now, there's no guarantee that the subjects for which we counted their experiment_exposure match the subjects for which we counted their successful registration via account_created event. That's because all users that try to register an account within the experiment via any link in our site, not just the anon warning, will have an enrollment attempt and potentially be part of the experiment even if they were never exposed to the anon warning.

The data engineering proposal was to limit the cohort of users for which we count the account registrations to the ones that have indeed been exposed to the warning. Using this approach (sanity checks and feedback welcome),

1WITH t_cohort AS (
2 SELECT DISTINCT
3 experiment.enrolled,
4 experiment.assigned,
5 experiment.subject_id
6 FROM event.mediawiki_product_metrics_contributors_experiments
7 WHERE experiment.enrolled = 'growthexperiments-editattempt-anonwarning'
8 AND action = 'experiment_exposure' -- limit cohort to users exposed to a variation
9 AND NOT performer.is_bot -- exclude known bots
10 ), t_account_registrations AS (
11 SELECT
12 IF(experiment.assigned = 'control', 'control', 'treatment') as variation,
13 experiment.subject_id
14 FROM event.mediawiki_product_metrics_contributors_experiments
15 WHERE experiment.enrolled = 'growthexperiments-editattempt-anonwarning'
16 AND action = 'account_created'
17 ), t_funnel_account_registrations AS (
18 SELECT
19 variation,
20 t_account_registrations.subject_id
21 FROM t_cohort
22 INNER JOIN t_account_registrations
23 ON t_cohort.subject_id = t_account_registrations.subject_id
24 )
25 -- Per-subject outcomes:
26 SELECT variation, COUNT(*) AS outcome
27 FROM t_funnel_account_registrations
28 GROUP BY variation

the results seem to still be showing that the control group is creating more accounts:

variationoutcome
control1626
treatment1398

same as we observed for all users that created accounts within the experiment:

variationoutcome
control4121
treatment3795

While the exposures look balanced, control: 87660| treatment: 87166, both the exposed and not exposed users seem to have the same pattern here. There's still T421152 open with an unclear impact. Another explanation is that our accountCreated query param signal is flawed for some reason in favor of the control group. We could and probably should run an AA test for this metric see if we're missing something but I'm afraid it will only give a confirmation on the issue rather than additional information. Ideas welcome, cc @KStoller-WMF @Michael @mpopov

Just chatted with Megan about this. Here's what the query should be:

WITH t_exposed AS (
  SELECT DISTINCT
    experiment.subject_id
  FROM {table}
  WHERE {where_boilerplate}
    AND action = 'experiment_exposure'
)
SELECT
  IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
  t_exposed.subject_id,
  CAST(SUM(IF(action = 'account_created', 1, 0)) > 0 AS INT) AS outcome
FROM t_exposed
LEFT JOIN {table} t_src ON t_exposed.subject_id = t_src.experiment.subject_id
WHERE {where_boilerplate}
GROUP BY 1, 2

The results from currently collected data are:

variationsample_sizesample_mean
control922970.0165227
treatment919130.0140459

(No SRM)

I guess the two possible explanations are:

  • Either there is a bug in the instrumentation where account_created events aren't fired as often as they should specifically for the treatment group
  • Or… treatment (1.4%) really is performing 15% worse than control (1.6%), which seems unexpected but it's certainly possible to encounter counter-intuitive results.
estimate_bayeschance_to_wincred_lowercred_upperestimate_freqp_valueconf_lowerconf_upper
-0.1480-0.21-0.086-0.150-0.212-0.087

So I guess it comes down to: how much trust do you have in the instrumentation?

Change #1269004 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning: instrument browser navigation and tab close

https://gerrit.wikimedia.org/r/1269004

Updated Account creation rate query in the catalog and reloaded the definition, new results up in the dashboard (same as what I shared in the comment earlier).

Change #1269004 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning: instrument browser navigation and tab close

https://gerrit.wikimedia.org/r/1269004

Change #1280226 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@wmf/1.46.0-wmf.26] loggedOutWarning: instrument browser navigation and tab close

https://gerrit.wikimedia.org/r/1280226

Change #1280226 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.46.0-wmf.26] loggedOutWarning: instrument browser navigation and tab close

https://gerrit.wikimedia.org/r/1280226

Mentioned in SAL (#wikimedia-operations) [2026-05-05T12:40:47Z] <sgimeno@deploy1003> Started scap sync-world: Backport for [[gerrit:1280226|loggedOutWarning: instrument browser navigation and tab close (T421518)]]

Mentioned in SAL (#wikimedia-operations) [2026-05-05T12:41:37Z] <sgimeno@deploy1003> sgimeno: Backport for [[gerrit:1280226|loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-05-05T12:44:43Z] <sgimeno@deploy1003> Finished scap sync-world: Backport for [[gerrit:1280226|loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s)

With all the validation issues fixed I was able to run some manual analysis with the reworked CTR instrumentation. The new added metrics and queries can be reviewed in [[ https://gitlab.wikimedia.org/sgimeno/experiment-analytics-configs/-/commit/447df61105300655954712a972183bfdcf0b903b | [DNM] Logged-out anon warning: add re-worked CTRs ]]. I used a query checker template with the following metrics and look at a day snapshot of today, (start 2026-05-14, end 2026-05-15); these where the results:

Metricvariationsample_sizesample_mean
Create Account link clickthroughcontrol58520.010595
treatment57560.019632
Log in link clickthroughcontrol58520.024265
treatment57570.024145
Edit without logging in clickthroughcontrol58520.078776
treatment57560.052467
Temporary Account Learn More link clickthroughcontrol57470.002436
treatment57360.002266
VE close button clickthroughcontrol57510.208312
treatment57400.204878
Device close tab clickthroughcontrol57540.135036
treatment57430.134773
Device back button clickthroughcontrol60130.577083
treatment59800.584783
SUM CTRscontrol408211.036503
treatment404681.022944

As we can see the overall CTR is now higher than 1. That may indicate some re-counting precision error but overall the numbers make a lot of sense:

  • ~57% of users click on back
  • ~13% of users close the tab
  • ~20% of users close the editor using the editor button

This is about the 90% of the page interaction, the remaining click through goes to the page links we had already weighted around 10-12% of the page interaction. With this numbers I believe we can conclude the investigation, but @KStoller-WMF has the last call.

Note: I made a mistake on the mode_switch interaction for which I didn't create artificial impressions, I'm trying to work out this problem by using the impressions of another interface but I don't think the mode switch CTR will alter much what we already know. For the snapshot analysis there are around 40K subjects per group and I could only observe around 40 mode switches, so this CTR will probably be very low.

Thank you so much for getting to the bottom of this, @Sgs!

I'm surprised that that abandonment is so high, but this is good to know!
I've added it to my to-do list to make sure we add a summary of these learnings to https://www.mediawiki.org/wiki/Contributors/Account_Creation_Experiments

There is somewhat related research underway that might help us better interpret why so many people click into edit and abandon: T424108: Analyze results of "Exit the editor" survey (Phase 1)

From my perspective we can consider this task resolved. But @Etonkovidova or engineers can reopen if further testing is needed.