Page MenuHomePhabricator

Logged-Out Warning Message: Instrumentation and Experiment Setup for first iteration A/B Test
Open, HighPublic

Description

User Story:

As the Growth team, we want to run a controlled A/B experiment with clear instrumentation so that we can understand how logged-out users interact with key entry points for account creation and editing, and use this data to inform future product decisions.

Overview

Set up instrumentation and experiment configuration for an A/B test using the Test Kitchen platform. This work will enable measurement of user exposure and interaction with key entry points related to account creation and editing.

Experiment Configuration

Experiment framework: Test Kitchen A/B test
Pilot wikis:

  • English Wikipedia (enwiki) enwiki is the only wiki that does not provide VE as default to mobile users, so we will exclude it from this experiment.
  • Arabic Wikipedia (arwiki)
  • French Wikipedia (frwiki)
  • Spanish Wikipedia (eswiki)
  • German Wikipedia (dewiki)
  • Russian Wikipedia (ruwiki)
  • Chinese Wikipedia (zhwiki)
  • Italian Wikipedia (itwiki)
  • Portuguese Wikipedia (ptwiki)
  • Persian Wikipedia (fawiki)
  • Polish Wikipedia (plwiki)

Experiment split:
50 percent treatment
50 percent control

Audience:

Roll out to the largest audience percentage permitted by the Experimentation Platform
Must respect current limitations for logged-out traffic: https://wikitech.wikimedia.org/wiki/Test_Kitchen/Conduct_an_experiment#Experiment_design:_user_traffic_per_wiki

Release date: March 26, 2026

Instrumentation Requirements

Ideally both the OLD and NEW Warning Message have similar instrumentation so we can compare the CTAs.

Impressions

  • Track impressions for the experiment entry point

Impression should fire when the experiment UI or experience is rendered and visible to the user

Click Events
Track clicks for all interactive elements within the experiment experience, including:

  • Sign up
  • Log in
  • Edit without logging in
  • Temporary accounts / learn more

KPIs

Acceptance Criteria
  • Experiment is successfully configured in Test Kitchen for all specified pilot wikis
  • Treatment and control allocation is verified at a 50/50 split
  • Rollout respects Experimentation Platform constraints for logged-out traffic
  • Impression events fire reliably and only once per eligible view
  • Click events are logged for all specified interactions
  • Event data is available (and ideally a clear dashboard is available to compare CTRs for each element along with downstream KPIs like Account Creation, Constructive Activation, and Constructive Edit Rate)
Metrics added to the experiment
MetricInstrument source
Experiment exposuresexperiment_exposure
Create Account link clickthrough'Sign up'
Log in link clickthrough'Log in'
Edit without publishing link clickthrough'Anon editing'
Temporary Account Learn More link clickthrough'Temp account info'
VE close button clickthrough'Close button'
Constructive edit rate (mobile web)edit_saved
Constructive edit rate of newer editors (mobile web)edit_saved

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Use new metric and rename another metricrepos/product-analytics/test-kitchen/experiment-analytics-configs!51bearlogaT416100main
Add Account creation rate metricrepos/product-analytics/test-kitchen/experiment-analytics-configs!50sgimenoaccount-creation-ratemain
Register Growth logged-out warning AB testrepos/product-analytics/test-kitchen/experiment-analytics-configs!46sgimenologgedout-experimentmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1237278 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning.js: add instrumentation for anon edit attempt intervention

https://gerrit.wikimedia.org/r/1237278

I'm trying to implement the instrumentation re-using as much existing instruments and metrics as we can and I have some trade-off questions:

  1. We seem to want to capture 4 CTRs, one for each of Sign up, Log in, Edit without logging in and Temporary accounts / learn more links. I was thinking of using the generic Clickthrough per user but the documentation for CTPU states It is generic because it does not differentiate between impressions/clicks from multiple sources (instruments) of impression/click events that might be present in the data collection for an experiment, so I guess for 4 different CTRs we'd need custom queries similar to the Getting started notification clickthrough which had one CTR for each notification type and primary/secondary link.
    • Does option (a) ensure a greater outreach given the .1% traffic limitation is per-experiment?
    • What's the recommendation for this recurring setup that has less overhead in order to streamline automated analysis: (a) 4 experiments with a generic CTR (CTPU) which would result in 4 superset automated analysis dashboards. Or (b) a single experiment with custom queries per-CTR and a condensed dashboard, same as we did for the Getting notification setup?
    • If this setup (or another) becomes a pattern, would it be worth on working on some template queries?
  2. We don't have a baseline overall CTR for any of these, or baseline impression (I'm currently investigating this), what's the best approach to get a baseline estimation? I'm thinking to run in parallel an overall clickthrough (all CTRs together) in an AA or AB form but maybe I'm missing some existing instrument we could look at, eg: editattempt

cc @mpopov

  1. The screen we're analyzing also has a 5th available CTR that is not part of the task description: the close button (F71624390). Should we also add a CTR for this experiment(s)?
  2. The task description states we want to look into Account creation metric and points to this Account creation instrument, this is a long lived instrument that's not part of any experiment that I know neither used by any metric from the catalog for automated analysis. Are we leaving this out of the automated analysis and doing it manually?
  3. Do we need to track anything after users click on "Edit without logging" or "Log in"?

cc @KStoller-WMF

I've been trying to collect some numbers that would be close to anon warning impressions with the following query:

SELECT *
FROM event.editattemptstep
WHERE meta.domain IN ({wiki_list})
AND event.action = "loaded"
AND event.editor_interface = "visualeditor"

however I'm stuck when I add the clause AND event.is_anon = true to filter out temp accounts and registered, if I inspect some random rows the value for is_anon is always None, is that expected? I'm also getting a Hive WARN HiveExternalCatalog that may be telling something is off in this table/schema. Any clues? cc @Urbanecm_WMF @mpopov ty!

I'm trying to implement the instrumentation re-using as much existing instruments and metrics as we can and I have some trade-off questions:

  1. We seem to want to capture 4 CTRs, one for each of Sign up, Log in, Edit without logging in and Temporary accounts / learn more links. I was thinking of using the generic Clickthrough per user but the documentation for CTPU states It is generic because it does not differentiate between impressions/clicks from multiple sources (instruments) of impression/click events that might be present in the data collection for an experiment, so I guess for 4 different CTRs we'd need custom queries similar to the Getting started notification clickthrough which had one CTR for each notification type and primary/secondary link.
    • Does option (a) ensure a greater outreach given the .1% traffic limitation is per-experiment?
    • What's the recommendation for this recurring setup that has less overhead in order to streamline automated analysis: (a) 4 experiments with a generic CTR (CTPU) which would result in 4 superset automated analysis dashboards. Or (b) a single experiment with custom queries per-CTR and a condensed dashboard, same as we did for the Getting notification setup?
    • If this setup (or another) becomes a pattern, would it be worth on working on some template queries?

These should be 4 different metrics. The idea is that you've identified these as metrics you're interested in experimenting with and hopefully there will be more experiments that target these metrics, either all together or subsets. Yes it's extra work to define 4 specific CTRs, one for each element you're interested in, but you only have to do that once.

  1. We don't have a baseline overall CTR for any of these, or baseline impression (I'm currently investigating this), what's the best approach to get a baseline estimation? I'm thinking to run in parallel an overall clickthrough (all CTRs together) in an AA or AB form but maybe I'm missing some existing instrument we could look at, eg: editattempt

Best approach would be to run an AA where you define & instrument these 4 metrics. AA gets analyzed, you get your baselines, then you can adapt the instrumentation to the AB test. Don't even need to run it for too long. You could run it for a week while you design & implement the treatment. You could also just YOLO it and skip the AA and dive right into the AB. The control group is the baseline.

  1. The task description states we want to look into Account creation metric and points to this Account creation instrument, this is a long lived instrument that's not part of any experiment that I know neither used by any metric from the catalog for automated analysis. Are we leaving this out of the automated analysis and doing it manually?

Where is the user taken after creating their account? Is it possible to have client-side instrument do Experiment#send( 'account_created' ) after the form succeeds?

If not, I guess you could attach a call to Experiment#send( 'create_account_submit' ) to the submit button click and hope that vast majority of account creation attempts are valid. Hm… Could we do Experiment#send( 'create_account_error' ) if there's a problem with submitting the form? Then to determine whether subject had succeeded in creating account we just check SUM(IF(action = 'create_account_submit', 1, 0)) - SUM(IF(action = 'create_account_error', 1, 0)) > 0

I've been trying to collect some numbers that would be close to anon warning impressions with the following query:

SELECT *
FROM event.editattemptstep
WHERE meta.domain IN ({wiki_list})
AND event.action = "loaded"
AND event.editor_interface = "visualeditor"

however I'm stuck when I add the clause AND event.is_anon = true to filter out temp accounts and registered, if I inspect some random rows the value for is_anon is always None, is that expected?

is_anon was deprecated in preparation for Temp Accounts. Instead, EditAttemptStep sets:

user_is_temp: mw.user.isTemp(),
user_class: mw.user.isAnon() ? 'IP' : undefined,

So the clause you should add is AND event.user_class = 'IP'

I'm also getting a Hive WARN HiveExternalCatalog that may be telling something is off in this table/schema.

Safe to ignore. That happens on any table that has been ‘evolved’ by Refine (the event processing/refinement pipeline).

These should be 4 different metrics. The idea is that you've identified these as metrics you're interested in experimenting with and hopefully there will be more experiments that target these metrics, either all together or subsets. Yes it's extra work to define 4 specific CTRs, one for each element you're interested in, but you only have to do that once.

Alright, I created Register Growth logged-out warning AB test where I hopefully got your recommendation right. Feedback welcome. I added an additional experiment_exposure event which is an impression of the logged-out warning/screen interface as a guardrail.

Best approach would be to run an AA where you define & instrument these 4 metrics. AA gets analyzed, you get your baselines, then you can adapt the instrumentation to the AB test. Don't even need to run it for too long. You could run it for a week while you design & implement the treatment. You could also just YOLO it and skip the AA and dive right into the AB. The control group is the baseline.

I think we'll go YOLO in this experiment and try to foresee the lack of established baselines for next experiments. But @KStoller-WMF has the last call.

Where is the user taken after creating their account? Is it possible to have client-side instrument do Experiment#send( 'account_created' ) after the form succeeds?

It could be taken almost anywhere depending on where/how the initiated the flow. Wouldn't client side be unreliable for capturing account_created events as they are prone to be cut-off by add-blockers, private browsing settings and others while the server side account creation is reliable, at least

If not, I guess you could attach a call to Experiment#send( 'create_account_submit' ) to the submit button click and hope that vast majority of account creation attempts are valid. Hm… Could we do Experiment#send( 'create_account_error' ) if there's a problem with submitting the form? Then to determine whether subject had succeeded in creating account we just check SUM(IF(action = 'create_account_submit', 1, 0)) - SUM(IF(action = 'create_account_error', 1, 0)) > 0

I guess we could do something like this. I'd like to understand first why most account creation instrumentation we have seems to be server side, eg: Instrument_list#Account_creation and try to build on top of something existing rather than from scratch

is_anon was deprecated in preparation for Temp Accounts. Instead, EditAttemptStep sets:

user_is_temp: mw.user.isTemp(),
user_class: mw.user.isAnon() ? 'IP' : undefined,

So the clause you should add is AND event.user_class = 'IP'

Thank you, I got to pull some numbers for the last month from the editattemptstep instrumentation with the following query:

WITH all_users AS (
    SELECT 
        webhost,
        event.user_id
    FROM event.editattemptstep
    WHERE meta.domain IN ({wiki_list})
    AND event.page_ns = 0
    AND event.action = "loaded"
    AND event.editor_interface = "visualeditor"
    AND event.user_class = 'IP'
    AND event.skin = 'minerva'
    AND year = 2026 AND month = 1
    AND event.user_is_temp = False
)
SELECT
    webhost,
    count(user_id) AS num_anon_edit_attempts
FROM all_users
GROUP BY webhost

Surprisingly or not the numbers are quite mixed between wikis, which may be something relevant for us to take in account, maybe we want to target more wikis, or a different group. Results:

webhostnum_anon_edit_attempts
ar.wikipedia.org142600
fr.wikipedia.org611918
en.wikipedia.org23327
ch.wikipedia.org8
de.wikipedia.org833281
es.wikipedia.org554608
ru.wikipedia.org675318
ca.wikipedia.org10988

This needs some sanity check as I'm not proficient analyzing results, @mpopov . Something that could be flawed is the count(user_id) which is counting all of the 0s which is the user_id anon users get. So this numbers are not ensured to be per unique user but give us a rough absolute. The most surprising are the small figures from enwiki and chwiki which could be showing some kind of wiki configuration heavily influencing the traffic to this interface. Or not.

Safe to ignore. That happens on any table that has been ‘evolved’ by Refine (the event processing/refinement pipeline).

Ack!

The low number for enwiki is probably due to AND event.editor_interface = "visualeditor", because, as I understand it, enwiki is the last wiki that still defaults to Source Editor on mobile.

Where is the user taken after creating their account? Is it possible to have client-side instrument do Experiment#send( 'account_created' ) after the form succeeds?

It could be taken almost anywhere depending on where/how the initiated the flow. Wouldn't client side be unreliable for capturing account_created events as they are prone to be cut-off by add-blockers, private browsing settings and others while the server side account creation is reliable, at least

If not, I guess you could attach a call to Experiment#send( 'create_account_submit' ) to the submit button click and hope that vast majority of account creation attempts are valid. Hm… Could we do Experiment#send( 'create_account_error' ) if there's a problem with submitting the form? Then to determine whether subject had succeeded in creating account we just check SUM(IF(action = 'create_account_submit', 1, 0)) - SUM(IF(action = 'create_account_error', 1, 0)) > 0

I guess we could do something like this. I'd like to understand first why most account creation instrumentation we have seems to be server side, eg: Instrument_list#Account_creation and try to build on top of something existing rather than from scratch

This is also relevant for other experiments targeting account creation that we plan to run in the remainder of this fiscal year. So far, we usually had related instrumentation in the onLocalUserCreated hooks after removing automatic creations and so on. Should this not work and be much more reliable than anything client-side?

The low number for enwiki is probably due to AND event.editor_interface = "visualeditor", because, as I understand it, enwiki is the last wiki that still defaults to Source Editor on mobile.

That's a good observation but I think event.editor_interface = "visualeditor" stands for the Editor itself and we're targeting both source and visual mode of "visualeditor" and on mobile the warning is shown by MobileFronted that applies it to both. I will review this by checking the instrumentation and reviewing/refining the query and get back.

Change #1240257 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning.js: add CTR instrumentation for the close button

https://gerrit.wikimedia.org/r/1240257

I can ensure event.editor_interface = "visualeditor" stands for VisualEditor rather than its mode, see editattemptstep/current.yaml#L318. I pulled the numbers for the target wikis, I guess this figures are not meaningful enough as they are just impressions heavily influenced by mobile traffic, but still an interesting mix of values:

webhostnum_anon_edit_attempts
de.wikipedia.org833281
ru.wikipedia.org675318
fr.wikipedia.org611918
it.wikipedia.org546587
pl.wikipedia.org218512
zh.wikipedia.org177880
ar.wikipedia.org142600
fa.wikipedia.org76875
pt.wikipedia.org29033

Ok, but if it is not the issue of the default editor, then how do we explain that the numbers for enwiki are two orders of magnitude lower than we would expect them to be?

Change #1240257 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning.js: add CTR instrumentation for the close button

https://gerrit.wikimedia.org/r/1240257

Change #1242300 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] Constructive edit rate and activation rate

https://gerrit.wikimedia.org/r/1242300

Where is the user taken after creating their account? Is it possible to have client-side instrument do Experiment#send( 'account_created' ) after the form succeeds?

It could be taken almost anywhere depending on where/how the initiated the flow. Wouldn't client side be unreliable for capturing account_created events as they are prone to be cut-off by add-blockers, private browsing settings and others while the server side account creation is reliable, at least

If not, I guess you could attach a call to Experiment#send( 'create_account_submit' ) to the submit button click and hope that vast majority of account creation attempts are valid. Hm… Could we do Experiment#send( 'create_account_error' ) if there's a problem with submitting the form? Then to determine whether subject had succeeded in creating account we just check SUM(IF(action = 'create_account_submit', 1, 0)) - SUM(IF(action = 'create_account_error', 1, 0)) > 0

I guess we could do something like this. I'd like to understand first why most account creation instrumentation we have seems to be server side, eg: Instrument_list#Account_creation and try to build on top of something existing rather than from scratch

This is also relevant for other experiments targeting account creation that we plan to run in the remainder of this fiscal year. So far, we usually had related instrumentation in the onLocalUserCreated hooks after removing automatic creations and so on. Should this not work and be much more reliable than anything client-side?

Thinking about this more, I had an idea, and I'm curious to hear your thoughts @Sgs and @mpopov:
What we could do is server-side, on account creation, we add a tiny js module to the payload that then in the user's client records the "account creation". That should be almost as comprehensive as tracking on the server directly (we'd only use the users that close their browser before the tab finished loading directly after submitting the account creation request, so non-zero amount of users but hopefully small and unbiased), and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies.

What do you think?

Wouldn't client side be unreliable for capturing account_created events as they are prone to be cut-off by add-blockers, private browsing settings and others while the server side account creation is reliable

Experiments that use edge uniques for enrollment don't send their events to intake-analytics.wikimedia.org (which is on many blocklists) but instead go through a special same-domain beacon, so they're not affected. (We are working on enabling that same behavior for experiments that use central user ID for enrollment and for instruments.)

What we could do is server-side, on account creation, we add a tiny js module to the payload that then in the user's client records the "account creation". That should be almost as comprehensive as tracking on the server directly (we'd only use the users that close their browser before the tab finished loading directly after submitting the account creation request, so non-zero amount of users but hopefully small and unbiased), and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies.

That might work – I'd love to hear from @phuedx.

What we could do is server-side, on account creation, we add a tiny js module to the payload that then in the user's client records the "account creation". That should be almost as comprehensive as tracking on the server directly (we'd only use the users that close their browser before the tab finished loading directly after submitting the account creation request, so non-zero amount of users but hopefully small and unbiased), and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies.

This will work but delivery simply won't be as reliable, as @Sgs noted in T416100#11620735. We are working hard to understand the limits here, which we will report out on as and when we know more. You can follow along in T417068: Synthetic experiment to test new event path (round 2) and T417143: Synthetic experiment to test new event path (round 3).

and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies

No SDK has (nor will they ever have) access to the Edge Unique cookie. Varnish does not (and will not) pass the raw value of the Edge Unique cookie to a downstream service. See also https://wikitech.wikimedia.org/wiki/Edge_uniques

When a device is enrolled in an everyone experiment, a subject ID (the term for the ID that identifies a device or user in an experiment) is derived from the Edge Unique cookie and the machine-readable experiment name. The JS SDK sends analytics events relating to everyone experiments to a distinct event intake URL. This URL is the only URL for which Varnish will pass the subject ID to the downstream service.

This will work but delivery simply won't be as reliable, as @Sgs noted in T416100#11620735.

Hm… Well, at least the reliability of delivery is unbiased and in theory does not interact with treatment, so if we end up with an undercount of account creations (via client-side instrumentation) then it will be the same undercount in both variations.

Given the constraints at play here, Michael's proposal is probably the best bet at the moment.

Change #1243204 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] [DNM] PoC: track account creation from server using TestKitchen edge-unique ids

https://gerrit.wikimedia.org/r/1243204

From https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1242300/comments/9335d643_1309cf56:

I'm not opposed to Michael's proposal but want to give it a thought. In the past GrowthExperiments had added hidden fields to the account creation form including information such as a variant assigned. I think the TestKitchen JS SDK should be able to add the subjectId information along with the variant and all be sent within the authentication request and received on the server side PHP hook that ensures account has been created. I will give it a thought and get back asap.

@Sgs: you should read Sam's comment again, which I'll include here for convenience:

and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies

No SDK has (nor will they ever have) access to the Edge Unique cookie. Varnish does not (and will not) pass the raw value of the Edge Unique cookie to a downstream service. See also https://wikitech.wikimedia.org/wiki/Edge_uniques

When a device is enrolled in an everyone experiment, a subject ID (the term for the ID that identifies a device or user in an experiment) is derived from the Edge Unique cookie and the machine-readable experiment name. The JS SDK sends analytics events relating to everyone experiments to a distinct event intake URL. This URL is the only URL for which Varnish will pass the subject ID to the downstream service.

Your PoC https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1243204 is not going to work. You're just going to get "awaiting" as the subject ID because the JS SDK doesn't know what the subject ID is:

and will never know what the subject ID is.

I understand your frustration here with not being able to collect data with the PHP SDK, but this was a deliberate system design decision. It's not something we just haven't gotten around to doing ourselves.


Also from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1242300/comments/9335d643_1309cf56:

If I understand that code correctly is logging edit_saved events regardless of the user being anon, temporary or logged-in(?). Should we do the same here? My understanding was that for this experiment we only want to record edit saves from users who have created an account within the experiment. It's unclear to me from the Constructive edit rate query in the the metrics_catalog.yml if that is ensured because it is filtering for performer_is_logged_in or if the instrumentation needs to ensure that with some safeguard.

Constructive edit rate includes all subjects who have an experiment_exposure event (see the t_cohort CTE).

You'll be able to narrow the results down to logged-in only using the user auth status filter in the Superset dashboard. For "everyone" experiments (using edge uniques for enrollment), automated analytics performs 3 analyses:

  • all subjects
  • logged-out only (AND NOT performer_is_logged_in)
  • logged-in only (AND performer_is_logged_in)

Change #1243204 abandoned by Sergio Gimeno:

[mediawiki/extensions/WikimediaEvents@master] [DNM] PoC: track account creation from server using TestKitchen edge-unique ids

Reason:

Won't work

https://gerrit.wikimedia.org/r/1243204

From https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1242300/comments/9335d643_1309cf56:

I'm not opposed to Michael's proposal but want to give it a thought. In the past GrowthExperiments had added hidden fields to the account creation form including information such as a variant assigned. I think the TestKitchen JS SDK should be able to add the subjectId information along with the variant and all be sent within the authentication request and received on the server side PHP hook that ensures account has been created. I will give it a thought and get back asap.

@Sgs: you should read Sam's comment again, which I'll include here for convenience:

and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies

No SDK has (nor will they ever have) access to the Edge Unique cookie. Varnish does not (and will not) pass the raw value of the Edge Unique cookie to a downstream service. See also https://wikitech.wikimedia.org/wiki/Edge_uniques

When a device is enrolled in an everyone experiment, a subject ID (the term for the ID that identifies a device or user in an experiment) is derived from the Edge Unique cookie and the machine-readable experiment name. The JS SDK sends analytics events relating to everyone experiments to a distinct event intake URL. This URL is the only URL for which Varnish will pass the subject ID to the downstream service.

Your PoC https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1243204 is not going to work. You're just going to get "awaiting" as the subject ID because the JS SDK doesn't know what the subject ID is:

and will never know what the subject ID is.

I understand your frustration here with not being able to collect data with the PHP SDK, but this was a deliberate system design decision. It's not something we just haven't gotten around to doing ourselves.


Also from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1242300/comments/9335d643_1309cf56:

If I understand that code correctly is logging edit_saved events regardless of the user being anon, temporary or logged-in(?). Should we do the same here? My understanding was that for this experiment we only want to record edit saves from users who have created an account within the experiment. It's unclear to me from the Constructive edit rate query in the the metrics_catalog.yml if that is ensured because it is filtering for performer_is_logged_in or if the instrumentation needs to ensure that with some safeguard.

Constructive edit rate includes all subjects who have an experiment_exposure event (see the t_cohort CTE).

You'll be able to narrow the results down to logged-in only using the user auth status filter in the Superset dashboard. For "everyone" experiments (using edge uniques for enrollment), automated analytics performs 3 analyses:

  • all subjects
  • logged-out only (AND NOT performer_is_logged_in)
  • logged-in only (AND performer_is_logged_in)

Ack, thank you for explaining. I've updated the patch for capturing edit saves on the client.

Thinking about this more, I had an idea, and I'm curious to hear your thoughts @Sgs and @mpopov:
What we could do is server-side, on account creation, we add a tiny js module to the payload that then in the user's client records the "account creation". That should be almost as comprehensive as tracking on the server directly (we'd only use the users that close their browser before the tab finished loading directly after submitting the account creation request, so non-zero amount of users but hopefully small and unbiased), and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies.

What do you think?

In theory it should work but I'm not sure what's a reliable server-side, on account creation, we add a tiny js module to the payload mechanism. None of LocalUserCreated or PostLoginRedirectHook seem perfectly suitable for this but I will give it a try and get back with findings. I think LocalUserCreated will loose context on redirects and PostLoginRedirectHook sounds like it would run on both signups and logins, but I need to double check this. Any other relevant hook for this I should explore?

Thinking about this more, I had an idea, and I'm curious to hear your thoughts @Sgs and @mpopov:
What we could do is server-side, on account creation, we add a tiny js module to the payload that then in the user's client records the "account creation". That should be almost as comprehensive as tracking on the server directly (we'd only use the users that close their browser before the tab finished loading directly after submitting the account creation request, so non-zero amount of users but hopefully small and unbiased), and we would still natively use the client-side testkitchen SDK that has native access to edge-unique cookies.

What do you think?

In theory it should work but I'm not sure what's a reliable server-side, on account creation, we add a tiny js module to the payload mechanism. None of LocalUserCreated or PostLoginRedirectHook seem perfectly suitable for this but I will give it a try and get back with findings. I think LocalUserCreated will loose context on redirects and PostLoginRedirectHook sounds like it would run on both signups and logins, but I need to double check this. Any other relevant hook for this I should explore?

Good point, if in the request where LocalUserCreated is fired we merely send a redirect, then any javascript module we might add to the output page is probably never shown.
I looked a bit and also saw a central auth hook that might be related: https://www.mediawiki.org/wiki/Extension:CentralAuth/Hooks/CentralAuthPostLoginRedirect

Change #1247120 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/MobileFrontend@master] editor overlay: defer editor-loaded until anonwarning is rendered

https://gerrit.wikimedia.org/r/1247120

Change #1247970 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] AccountCreation: track account regitrations using TestKitchen JS sdk

https://gerrit.wikimedia.org/r/1247970

Good point, if in the request where LocalUserCreated is fired we merely send a redirect, then any javascript module we might add to the output page is probably never shown.
I looked a bit and also saw a central auth hook that might be related: https://www.mediawiki.org/wiki/Extension:CentralAuth/Hooks/CentralAuthPostLoginRedirect

My shallow understanding is that trying to add RL modules on a redirect response won't work by design because there isn't a load.php request to work with, but I recognize I haven't investigated the full handling of auth redirects in MW/CA/Sul3. Just by inspecting the network request during an account creation there are 3 redirects involved. The standard approach would be to use returnToQuery or add a dedicated query parameter that holds the information that the account creation needs to be tracked. However there are hooks manipulating the returnToQuery (eg: welcome survey) and implementing it requires more effort. Not opposed to it but it seems not worth doing it in the context of a single experiment. It would be more valuable if we could implement it in an enduring an reusable way for account creation experiments in general.

My alternative proposal is to follow the user option setting on LocalUserCreated pattern which has proven reliable for a long time in GrowthExperiments. This has the downside of polluting user_properties and will require a user option cleanup at the end of the experiment but it seems to me a reliable way to wrap this instrumentation implementation in the context of this experiment. Let me know what you think, @mpopov @Michael

I think the approach that you've chosen is a good solution for this experiment, and I'm reviewing the related patches. For now, my note would be (I'll write it in the change too), that we have multiple experiments planned at various wikis that will need account creation rate. So we need to make this a bit more generic.

But stepping back for a moment, we also need a robust longer-term solution on how to experiment with account creation rate. Number of new accounts is an essential metric for our movement, and we will have to run experiments aimed at it in the years to come from various teams. So we eventually need a more robust solution that does not leave user_properties rows lying around. Though that is beyond the scope of this task.

I think the approach that you've chosen is a good solution for this experiment, and I'm reviewing the related patches. For now, my note would be (I'll write it in the change too), that we have multiple experiments planned at various wikis that will need account creation rate. So we need to make this a bit more generic.

But stepping back for a moment, we also need a robust longer-term solution on how to experiment with account creation rate. Number of new accounts is an essential metric for our movement, and we will have to run experiments aimed at it in the years to come from various teams. So we eventually need a more robust solution that does not leave user_properties rows lying around. Though that is beyond the scope of this task.

Agreed, we're discussing a solution with @Milimetric that wouldn't involve any user option but passing a JsConfigVar from the server and store the flag in the client with mw.storage setting an expiricy.

Change #1248476 had a related patch set uploaded (by Milimetric; author: Milimetric):

[mediawiki/extensions/WikimediaEvents@master] [WIP] Idea for general mechanism to tell client an account registration just happened.

https://gerrit.wikimedia.org/r/1248476

Change #1248533 had a related patch set uploaded (by Milimetric; author: Milimetric):

[mediawiki/extensions/GrowthExperiments@master] Signal onBeforePageLoad handlers on account create

https://gerrit.wikimedia.org/r/1248533

Change #1248533 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Signal onBeforePageLoad handlers on account create

https://gerrit.wikimedia.org/r/1248533

Change #1248476 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Send signal to clients that an account was created

https://gerrit.wikimedia.org/r/1248476

Change #1249350 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/MobileFrontend@master] EditorBaseOverlay: add mode switch information to close event

https://gerrit.wikimedia.org/r/1249350

Change #1242300 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning.js: track edits within experiment

https://gerrit.wikimedia.org/r/1242300

Change #1247120 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] editor overlay: defer editor-loaded until anonwarning is rendered

https://gerrit.wikimedia.org/r/1247120

KPIs

Account Creation
TBD: Constructive Activation
TBD: Constructive Edit Rate

@Sgs In the task description, I listed Account Creation as the main KPI. This is the primary metric we will use to measure success.

I listed Constructive Activation and Constructive Edit Rate as TBD because I was not sure whether they can be measured with the current Test Kitchen setup. I see these as secondary metrics that we would ideally track. However, if tracking them would significantly complicate the experiment release, I am open to adjusting the acceptance criteria.

If we need to prioritize, Constructive Edit Rate is the lowest priority, especially if we are able to track Constructive Activation.

KPIs

Account Creation
TBD: Constructive Activation
TBD: Constructive Edit Rate

@Sgs In the task description, I listed Account Creation as the main KPI. This is the primary metric we will use to measure success.

I listed Constructive Activation and Constructive Edit Rate as TBD because I was not sure whether they can be measured with the current Test Kitchen setup. I see these as secondary metrics that we would ideally track. However, if tracking them would significantly complicate the experiment release, I am open to adjusting the acceptance criteria.

If we need to prioritize, Constructive Edit Rate is the lowest priority, especially if we are able to track Constructive Activation.

This is how I understood it. Constructive Edit Rate and Constructive Activation were easy to achieve because we have already used them in prior experiments, that means the metric definition and query exists in the metrics catalogs, and we have examples of how to build instrumentation for it. Account Creation as the main KPI is a green field given the restriction noted by @mpopov tjat TestKitchen imposes using client side instrumentation for everyone experiments. Also because it seems we don't have yet an established baseline for account creation, T402533: Establish baselines for account creation behaviour.

I'm currently working on the Account Creation metric. I would name it Account Creation Rate, as opposed to (Absolute) Account Creation, which is what we see in Contributors dashboard. I hope this clarifies the current status quo of the task.

Change #1249350 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] EditorBaseOverlay: add mode switch information to close event

https://gerrit.wikimedia.org/r/1249350

Change #1242300 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] loggedOutWarning.js: track edits within experiment

https://gerrit.wikimedia.org/r/1242300

Double checking - are we deploying this experiment to English Wikipedia? It's crossed out in the task description, but I see English Wikipedia: 0.1 % in https://test-kitchen.wikimedia.org/experiment/growthexperiments-editattempt-anonwarning. Am I missing a piece?

Good catch, Lauren.
Enwiki should not be included in this experiment, as enwiki will instead be part of this experiment: T419916: [V1 experiment release] Redesign mobile web account creation form following Codex guidelines.

@Sgs is that a typo in the test kitchen documentation, or does that need to be updated?

Change #1253450 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/WikimediaEvents@wmf/1.46.0-wmf.19] AccountCreation: track account registrations for WE1.8 experiments

https://gerrit.wikimedia.org/r/1253450

Change #1247970 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] AccountCreation: track account registrations for WE1.8 experiments

https://gerrit.wikimedia.org/r/1247970

Change #1253450 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.46.0-wmf.19] AccountCreation: track account registrations for WE1.8 experiments

https://gerrit.wikimedia.org/r/1253450

Mentioned in SAL (#wikimedia-operations) [2026-03-16T14:09:17Z] <sgimeno@deploy2002> Started scap sync-world: Backport for [[gerrit:1253461|fix(anon warning): remove wring type=signup param (T415160)]], [[gerrit:1253450|AccountCreation: track account registrations for WE1.8 experiments (T416100)]]

Mentioned in SAL (#wikimedia-operations) [2026-03-16T14:11:06Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:1253461|fix(anon warning): remove wring type=signup param (T415160)]], [[gerrit:1253450|AccountCreation: track account registrations for WE1.8 experiments (T416100)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-03-16T14:18:33Z] <sgimeno@deploy2002> Finished scap sync-world: Backport for [[gerrit:1253461|fix(anon warning): remove wring type=signup param (T415160)]], [[gerrit:1253450|AccountCreation: track account registrations for WE1.8 experiments (T416100)]] (duration: 09m 16s)

When is the end date for this A/B test?

@Trizek-WMF as per Test Kitchen UI, the end date is currently scheduled for 2026-06-29