Page MenuHomePhabricator

Data QA for instrumentation of empty search recommendation AB test
Closed, ResolvedPublic

Assigned To
Authored By
jwang
Feb 24 2025, 6:23 PM
Referenced Files
F58513716: image.png
Feb 27 2025, 7:53 PM
F58513713: image.png
Feb 27 2025, 7:53 PM
F58481646: image.png
Feb 24 2025, 11:55 PM
F58481575: image.png
Feb 24 2025, 11:55 PM

Description

Instrumentation was added to track the clicks on search recommendations and the session ticks for sessions that initiated a search. (T378094)

We will need to QA the aggregate data once we start receiving data to confirm data is logged as expected once in production. Metrics events will be stored in the product_metrics_web_base_search_ab_test_clicks and product_metrics_web_base_search_ab_test_session_ticks table.

Related Tickets
QA of search recommendations event data: T386735
QA of session tick: T386229
Initial data QA: T386142#10553847

Instrumentation Documents

Event Timeline

QA summary of product_metrics_web_base_search_ab_test_clicks table
EventsField value in tableStatusNote
Actions of clicking the empty search boxaction=click, action_source=search_box✅ PASS
Impressions of the empty-state recommendations (only experiment group)action=show, action_subtype=show_empty_state_recommendationIssue 1: Have data from control group too
Actions of entering the first letteraction=type, action_source=search_box✅ PASS
Impressions of the type-ahead search recommendationsaction=show, action_subtype=show_autocomplete_recommendation✅ PASS
Action of clicking the empty-state recommendations (only experiment group)action=click, action_source=empty_search_suggestion✅ PASS
Action of clicking the type-ahead search recommendationsaction=click, action_source=autocomplete_search_suggestion✅ PASS
wikimediawiki.database✅ PASS
platformclient_platform_family✅ PASS
skinmediawiki.skin✅ PASS
search_activity_idfunnel_entry_token, funnel_name=Search related articlesIssue 2
session_idperformer.session_id✅ PASS
experiment groupexperiments.assigned✅ PASS
is_logged_inperformer.is_logged_in , performer.is_temp✅ PASS
timestampdt , meta.dt✅ PASS.

Issue 1
Have events from control group for the impressions of the empty-state recommendations. It is supposed to be shown only for the treatment group.

image.png (248×436 px, 23 KB)

query <- "
SELECT element_at(experiments.assigned, 'RelatedArticles test experiment') AS assigned_group, COUNT(1) AS events
FROM event.product_metrics_web_base_search_ab_test_clicks
WHERE year=2025 and month=2 AND  day=23
AND action='show'
AND action_subtype='show_empty_state_recommendation'
GROUP BY element_at(experiments.assigned, 'RelatedArticles test experiment')
"

Issue 2

Field funnel_entry_token is NULL sometime for Impressions of the empty-state recommendations event. When it happens, the Impressions of the empty-state recommendations event is ahead of Actions of clicking the empty search box by timestamp.

image.png (934×2 px, 194 KB)

SELECT dt, action, action_context, action_source, action_subtype, funnel_entry_token, experiments
FROM event.product_metrics_web_base_search_ab_test_clicks
WHERE year=2025 and month=2
AND performer.session_id='131ea9cc3ac38d665043'
ORDER BY dt
limit 100
QA summary of product_metrics_web_base_search_ab_test_session_ticks table
EventsField value in tableStatusNote
tick eventaction=tick, action_context= <count in integer>Issue 3: The number of events is high
wikimediawiki.database✅PASS
platformclient_platform_family✅PASS
skinmediawiki.skin✅PASS
experiment groupexperiments.assigned✅PASS
is_logged_inperformer.is_logged_in & performer.is_temp✅PASS
session_idperformer.session_id✅PASS
timestampdt & meta.dt✅ PASS

Issue 3: The number of tick events is high

Here is the number of unique sessions for each type of event, collected on Feb 23, 2025. As we discussed, we plan to create tick events only for sessions that focused (clicked) on the search box instead of all page loads. However, the recorded number of unique sessions is too high, (if we have implemented the change.) For example, cawiki has 85415 sessions with tick events, while 901 sessions has focused.

wikidayTicksActions of clicking the empty search boxExperimentDisabledexperimentEnabledAction of clicking the empty-state recommendations (only experiment group)empty search recommendation CTR rate
sessionssessionssessionssessionssessionssessions that clicked on the empty recommendations/sessions that clicked the empty search box
cawiki2385415901465436194.36%
euwiki233329112565635.36%

Note:

  • We bucketed 100% of page loads by token.
  • We sampled 100% of events by session id.

Query

-- Count unique sessions that have tick events
SELECT  mediawiki."database", 
count(distinct performer.session_id) AS sessions, count(1) AS events
FROM event.product_metrics_web_base_search_ab_test_session_ticks
WHERE year=2025 and month=2 and day=23
AND NOT performer.is_logged_in 
GROUP BY  mediawiki."database"

-- Count unique sessions that have clicked on the search bar
SELECT mediawiki."database",  element_at(experiments.assigned, 'RelatedArticles test experiment') AS assigned_group , day, count(distinct performer.session_id) AS sessions, count(1) AS events
FROM event.product_metrics_web_base_search_ab_test_clicks
WHERE year=2025 and month=2 AND day=23
AND action='click'
AND action_source='search_box'
GROUP BY mediawiki."database", element_at(experiments.assigned, 'RelatedArticles test experiment')  , day

Code repo

jwang renamed this task from Data QA of empty search recommendation AB test to Data QA for instrumentation of empty search recommendation AB test.Feb 24 2025, 11:58 PM
Investigation on issue 3

Summary

  • We deployed on another pilot wiki, frwiki, on February 25 after some fixes. However, we still observed Issue 3: The number of tick events is way too high.
  • The ratio is about 1:95 (16k clicked unique sessions : 1504k ticked unique sessions) on frwiki
  • We have confirmed that some sessions without recorded clicks were logged for tick events. In theory, we only log tick events after users click on the search box.
  • Based on the investigation by user agent, the pattern of about 100x more unique sessions from 'tick' than 'click' is consistent across os_family, device_family, browser_family, and os_major.

Number of unique sessions that have clicked on search box *

image.png (404×1 px, 50 KB)

Number of Count unique sessions that have tick events *

image.png (244×782 px, 24 KB)

*Note: The data above was collected for the incomplete days of Feb 25 and Feb 26.

Cross check with other data source
From the session_length_daily table, on frwiki, there are an average of 9,370,967 unique sessions daily across all platforms and skins, with an estimated 7,514,578 daily unique sessions from mobile web. On 2025-02-25, 8,927,310 unique sessions across all platforms and skins, with an estimated 7,158,810 unique sessions from mobile web.
From the pageview_hourly table, on frwiki, there are 13,405,061 pageviews from mobile web on 2025-02-25.

Web team has fixed issue 2 and issue 3 and deployed on frwiki yesterday. (March 3) .

Here are the data QA result based on the data collected on March 4 (an incomplete day). I have confirmed that

  1. the number of unique sessions from tick events is close to the number of the unique sessions from click events,
  2. tick events can be joined with click events by session_id.
  3. Picked one active session, funnel_entry_token was recorded correctly.
  4. Around 0.5% of sessions have focused on search box on frwiki so far on March 4.
  5. around 0.029% of sessions in the treatment group have clicked on empty recommendations on frwiki so far on March 4.

Note for future analysis.
Issue 1 still exists. Engineer has decided to deprioritize fixing it. For analysis, we need to ignore the events of "Impressions of the empty-state recommendations" from the control group. To filter the event, use: action=show, action_subtype=show_empty_state_recommendation

Three Instrumentation Documents have been updated to reflect the deployed version of the instrumentation. 1) Measurement plan, 2) Instrumentation spec, 3) Instrumentation with a visualized workflow.

Team has deployed on all wikis except for enwiki on Mar 4.

I verified the volume of events from the 'RelatedArticles test experiment default' version by March 5. It looks normal, firing around 120 events per second from tick stream and click stream.

Resolving this, thanks @jwang!