Page MenuHomePhabricator

Analysis of the deployment plan for the Empty Search A/B test
Closed, ResolvedPublic

Description

WEB team plan to run A/B test for Empty Search on mobile anonymous users. To support the deployment, product analytics provides analysis for the deployment plan and estimates the parameters for setting up the A/B test.

To Do
  • Power analysis, estimate sample size and propose
  • bucketing sampling rate,
  • session length sampling rate,
  • click tracking sampling rate
  • Check the incoming data on pilot wiki (euwiki, cawiki), and suggest any changes to the bucketing rate and sampling rates.

Event Timeline

jwang renamed this task from Estimate sample size for Empty Search A/B test to Analysis of the deployment plan for the Empty Search A/B test.Feb 11 2025, 9:36 PM
jwang updated the task description. (Show Details)
Quiddity subscribed.

(Removing User-notice tag, because it looks like it was just replicated from the parent-task. If this needs a separate announcement, please re-add it!)

Napkin math time! [Please help me correct this napkin math if anything is wrong]

@bwang @jwang So to confirm the patch we put live.

  • Looking at ca.wikipedia.org we get 954,364 page views every day.

https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/ca.wikipedia.org/mobile-web/all-agents/daily/20250209/20250210

  • Based on the config patch we deployed 1% of those will be in the experiment, meaning around 9544 page views will be bucketed in the experiment per day.
  • Of those bucketed, we are sending events at a sample rate of 50%, so that means 4772 users will be eligible for events in the experiment per day.
  • About 0.8% of mobile web pageview actors are autocomplete searchers [ source ].
  • So my best guestimate if this is all working is per day for Catalan Wikipedia we should be seeing approx
  • 4772 events to mediawiki.web_AB_test_enrollment
  • 4772 events action_subtype: 'init_search_box'to product_metrics.web_base.search_ab_test_clicks
  • 4772+ tick events to product_metrics.web_base.search_ab_test_session_ticks (will likely be more given more events are fired the longer the session)
  • 38 events for action_source: 'search_box' to product_metrics.web_base.search_ab_test_clicks, 50% of which will have a different value for experiments.assigned

If anything is not matching those numbers - let's discuss further!
Note we also enabled on Basque (and I haven't included that so filter accordingly)

The analysis is based on the data collected for an incomplete day on 2025-02-14, at 17:15:55 UTC.
Findings:

  • On cawiki, 2.44% of sessions clicked the empty search box.
  • On euwiki, we have not captured the clicks due to a low volume of incoming events.
  • The enrollment is lower than our estimation. For example, pageviews from the mobile web on cawiki by 2025-02-14 at 17:00 UTC totaled 152,664. If all of them were qualified for bucketing, and we have bucketed 1% of them, it would be approximately 1,500 enrollments. However, we actually bucketed 653 pageviews, possibly because not all mobile web users are using the Minerva skin, or EventLogging has a lag of a few hours.

Recommendations:
We'd like to increase our bucket rate and sample rate, given that both the enrollment and focus rates are lower than our previous estimation(5% sessions).
As we will bucket on page load by token and will start to log session tick once search is focused, we can

  • Bucketing: 100% (50%:50%) for all wikis
  • Analytics sample rate of tick stream: all wikis 100%
  • Analytics sample rate of click stream : all wikis 100%
  • Deploy in tiers.

With these, the events/sec will be around 200~350. We can detect a 1% effect size with 80% power in 2 weeks if the focus rate is 2% and in 3 weeks if the focus rate is 1% .
Please find the detailed estimation in google sheet.

Data
The number of unique sessions for each type of actions.

wikiLanguagedayenrollmentexperimentDisabledexperimentEnabledSearch sessions initiationsActions of clicking the empty search boxAction of clicking the empty-state recommendations (only experiment group)Focus rate
sessions *sessions*sessions *sessions **sessions **sessions **sessions that clicked the empty search box/(enrolled sessions*sample rate)
cawikica14409196213218502.44%
euwikieu142217511000

The number of pageviews/events for each type of actions.

wikiLanguagedayenrollmentexperimentDisabledexperimentEnabledSearch sessions initiationsActions of clicking the empty search boxAction of clicking the empty-state recommendations (only experiment group)Focus rate
events *events *events *events **events **events **pageviews that clicked the empty search box/(enrolled pageviews * sample rate)
cawikica146533413123381805.51%
euwikieu142620614000

Note:
* We bucketed 1% of page loads by token.
** We sampled 50% of events by session id.

query for pageviews on mobile web.

SELECT sum(view_count)
FROM wmf.pageview_hourly
WHERE year=2025 and month=2 and day=14 AND hour < 17
AND access_method='mobile web'
AND agent_type !='spider'
AND project='ca.wikipedia'
LIMIT 100

While exploring the data, I noticed that some data records are missing or not as expected. I document them here for discussion.

Schema product_metrics_web_base_search_ab_test_session_ticks

Still no data

How to verify

SELECT * 
FROM event.product_metrics_web_base_search_ab_test_session_ticks
WHERE year=2025 and month=2 
LIMIT 5
Schema product_metrics_web_base_search_ab_test_clicks
  1. wiki database code is NULL

Field name: mediawiki.database

image.png (216×612 px, 25 KB)

  1. No events for action of clicking the empty-state recommendations (only experiment group)

Maybe it's just due to a low click rate, so we haven't captured it yet.action='click' AND action_source='empty_search_suggestion'
How to verify

SELECT mediawiki.`database`, normalized_host.project , day, count(distinct performer.session_id) AS sessions, count(1) AS events
FROM event.product_metrics_web_base_search_ab_test_clicks
WHERE year=2025 and month=2
AND action='click'
AND action_source='empty_search_suggestion'
GROUP BY mediawiki.`database`,normalized_host.project , day
  1. the experiment name includes language code.

It would be difficult for the query to retrieve the group assignment value if the key differs per wiki.

image.png (484×1 px, 77 KB)

  1. performer.is_temp is NULL

As we are in the transition period of deploying temp accounts, for some wikis where temp accounts have been deployed, we need to use this field to identify anonymous users instead of performer.is_logged_in.
Please see slack, or documents [1,2] for details
[1] https://phabricator.wikimedia.org/T337103
[2] https://www.mediawiki.org/wiki/Talk:User_account_types#What_does_%22anonymous%22_mean

Schema mediawiki_web_ab_test_enrollment

1, The experiment name includes language code. Same as above.

image.png (314×492 px, 26 KB)

Talked more with @jwang, one other issue is we saw "show_empty_state_recommendations" on page load with an empty funnel_entry_token, on initial analysis this might be because ext.MobileFrontend.searchOverlay.empty hook fires on page load.

Talked more with @jwang, one other issue is we saw "show_empty_state_recommendations" on page load with an empty funnel_entry_token, on initial analysis this might be because ext.MobileFrontend.searchOverlay.empty hook fires on page load.

Query to verify

SELECT dt, action, action_context, action_source, action_subtype, funnel_entry_token
FROM event.product_metrics_web_base_search_ab_test_clicks
WHERE year=2025 and month=2
AND performer.session_id='0503d55281c5066deefa'
ORDER BY dt
limit 100

most importantly, we are still not seeing session tick events, which means something is still wrong with the events.

most importantly, we are still not seeing session tick events, which means something is still wrong with the events.

This is because the fix for T386229 hasn't been backported. Jan will do that today.

Can you resolve this when you've represented all the follow up work needed? Thanks!

bwang updated the task description. (Show Details)