[SPIKE] Identify cause for duplicate A/B test bucketing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ovasileva
	Mar 22 2022, 11:24 AM

Description

Background

During the analysis of the sticky header a/b test T298873: Analyze results of sticky header A/B test, we identified that 809 sessions were assigned to both the control and test groups. We would like to look into why this is happening in order to avoid it in future tests

Acceptance criteria

Determine reasons for duplicate A/B test bucketing

Related Objects
Search...

Status	Assigned	Task
Resolved	ovasileva	T283505 [EPIC] Fixed "sticky" site header
Resolved	• alexhollender_WMF	T292304 [Goal] Deploy sticky header to pilot wikis
Resolved	jwang	T298873 Analyze results of sticky header A/B test
Resolved	ovasileva	T273473 [Epic] Improve the table of contents
Resolved	ovasileva	T304169 [Goal] Begin table of contents A/B test
Resolved	ovasileva	T304419 [SPIKE] Identify cause for duplicate A/B test bucketing

Event Timeline

ovasileva removed jwang as the assignee of this task.Mar 22 2022, 11:24 AM

ovasileva triaged this task as High priority.

ovasileva created this task.

ovasileva added a parent task: T304169: [Goal] Begin table of contents A/B test.

ovasileva added a subscriber: jwang.

Jdlrobson claimed this task.Mar 23 2022, 11:41 PM

809 sessions are assigned to both control and test groups. We excluded them in analysis.

For the sticky header experiment we bucketed users by their user ID: mw.user.getId().toString().
The web_session_id relates to their session and is randomly generated at the start of a browsing session.

So if we're seeing two identical web_session_ids it suggests that the user ID was different for the same user resulting in two different groups.

This situation can happen if the user has two accounts. For example a sock puppet or a staff switching to their volunteer account. They are using the same browser so have the same session ID, however since we're bucketing by user ID they are seeing different treatments for those two accounts.

To support this hypothesis, I ran the query from analysis

SELECT  web_session_id, wiki, count(distinct `group` ) AS groups,  min(meta.dt) AS session_dt 
FROM event.mediawiki_web_ab_test_enrollment
WHERE wiki NOT IN ('testwiki','test2wiki')  and year=2022 and month=1 and day BETWEEN 6 AND 30
GROUP BY  web_session_id, wiki
HAVING groups>1

I took some of those session IDs and ran query:

SELECT  *
FROM event.mediawiki_web_ab_test_enrollment
WHERE web_session_id = (redacted)
and year=2022 and month=1 and day = 16 LIMIT 16;

I looked at those duplicate rows, and verified that they have similar timestamps, but there was a pattern - if looked at in chronological order the group changes are not random which might suggest someone switching accounts in the same browser.

This hypothesis is strengthened by looking at the edit count bucket for events that occurred around the same timestamps and seeing the editCountBucket is different.

SELECT  *
FROM event.desktopwebuiactionstracking
WHERE event.token = <redacted>
and year=2022 and month=1 and day = 16 LIMIT 8;

It works like follows:

Jon opens English Wikipedia in his browser. He's assigned session ID A, and he has user ID Z. User ID Z is used to bucket Jon and he is given bucket "stickyHeaderDisabled"
Jon logs out. He is no longer bucketed as an anon. His session ID remains at A as he hasn't closed the browser.
Jon logs in as Jon (WMF) within the same browser window. He's assigned user ID Y. His session ID remains at A as he hasn't closed the browser. User ID Y however results in a different bucket "stickyHeaderEnabled"

So what this data is telling you is that 809 sessions were by users with multiple accounts.

cc @ovasileva @jwang

cjming awarded a token.Mar 25 2022, 4:58 PM

That's _very_ interesting. Thanks for doing this! Would it make sense to just test this theory a bit more?

If I understand correctly, 809 is the number of sessions we examined in the context of the sticky header experiment that were used by users with multiple accounts. What percent of the total sessions in the experiment is that?

Could we verify that this ratio is somewhat the same in users outside of the experiment as a kind of sanity check?

Assuming points 1. and 2. line up correctly, should we bucket based on session ID + user ID, so we have a consistent experience per user?

From the analysis my understanding is that is 809 out of 435756 user sessions so 0.18%

All sessions would be the sum of these fields:

Screen Shot 2022-03-28 at 8.56.01 AM.png (288×756 px, 24 KB)

I don't think so. We usually avoid logging user id in events as it identifies users. The only reason here we know that user ID is different is because the A/B test leaked that information by using the page ID for the bucketing. To verify this we'd likely need to run another A/B test using page ID

For logged in users, yes. As long as we do it consistently. However, there may be a privacy cost to doing this if the session ID is derived from the user ID. One thing we could do that may be far simpler is invalidating the session ID on logout. What do you think?

Jdlrobson removed Jdlrobson as the assignee of this task.Mar 30 2022, 4:45 PM

Jdlrobson assigned this task to ovasileva.

Jdlrobson subscribed.

Seems like we've identified the cause and the ratio is small enough to where we can exclude these sessions from the analysis without larger concerns. Resolving.

	F35027352: Screen Shot 2022-03-28 at 8.56.01 AM.png
	Mar 28 2022, 6:00 PM

[SPIKE] Identify cause for duplicate A/B test bucketingClosed, ResolvedPublicActions