Page MenuHomePhabricator

Add field in mediawiki_web_ab_test_enrollment schema to track whether user is logged in
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

Currently when looking at mediawiki_web_ab_test_enrollment in isolation it's not possible to see if the event is for logged in users or anonymous users. It must be joined with another schema to work out that information. Since some schemas have different sampling rates for anonymous and logged in users, it might be helpful to understand when events are missing why that might be.

Having this field in the schema would make the analysis of the A/B test more straightforward

Acceptance criteria

  • Add field in mediawiki_web_ab_test_enrollment that tracks whether user is anon
  • Add field in mediawiki_web_ab_test_enrollment that tracks whether user is bot

QA steps

  • To test with local event logging, make sure an A/B test is enabled i.e. TOC experiment config is set in local settings:
$wgVectorWebABTestEnrollment = [
	'name' => 'skin-vector-toc-experiment',
	'enabled' => true,
	'buckets' => [
		'unsampled' => [
			'samplingRate' => 0,
		],
		'control' => [
			'samplingRate' => 0.5,
		],
		'treatment' => [
			'samplingRate' => 0.5,
		]
	]
];
  • Navigate to an article page as anonymous, then logged-in user
  • Note in local event stream that the new properties is_anon + is_bot are populated and correspond to your logged state

Screen Shot 2022-05-18 at 3.44.57 PM.png (814×1 px, 157 KB)

  • I don't have a good way of testing is_bot other than injecting code into the following function locally and reloading page:
function isUserBot() {
        mw.config.set( 'wgUserGroups', [ '*', 'bot' ] );
	var userGroups = mw.config.get( 'wgUserGroups', [] );
	return userGroups.indexOf( 'bot' ) !== -1;
}

QA Results - Prod

ACStatusDetails
1T307381#7981675
2T307381#7981675

Event Timeline

ovasileva triaged this task as Medium priority.May 2 2022, 5:28 PM
ovasileva raised the priority of this task from Medium to High.

@jwang - do you know if is_bot is something we usually add to schemas or is that filtering that is done by the platform itself?

Change 791422 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Add is-anon property to schema

https://gerrit.wikimedia.org/r/791422

@ovasileva @jwang is the 2nd AC still relevant? I have a patch up for adding is-anon property -- lmk if I need to follow up with 2nd is-bot property.

FWIW, I feel like we've discussed this before (can't seem to find relevant ticket) and my recollection is that this is being collected upstream.

Change 791425 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/extensions/WikimediaEvents@master] Populate is-anon property during web a/b test enrollment

https://gerrit.wikimedia.org/r/791425

@jwang - do you know if is_bot is something we usually add to schemas or is that filtering that is done by the platform itself?

@ovasileva @jwang is the 2nd AC still relevant? I have a patch up for adding is-anon property -- lmk if I need to follow up with 2nd is-bot property.

In legacy event logging platform, is_bot is by default captured under useragent.is_bot.
In modern event logging platform, as far as I know, is_bot is not recorded.

@ovasileva @cjming

Here is the previous discussion about is_bot: T294246#7505173.

Under current instrumentation, user_agent_map['device_family'] == 'Spider' can identify some well know and self identified bot. For example, in the entire April, we identified 31 sessions are from Spider on hewiki, 0 from euwiki.

domaingroupsessionsevents
he.wikipedia.orgcontrol1528
he.wikipedia.orgtreatment1635

Meanwhile, FYI. bot info is stored in user_groups table in mariaDB. query to get is:
select * from user_groups where ug_group ='bot' limit 5;
I don't know whether it's possible or worth doing to retrieve user_groups info from there. We don't record user_id in ab_test schema, so we can not get the info by query.

If we can not add is_bot, we can at least exclude those well known Spiders with current instrumentation.

SCherukuwada subscribed.
SCherukuwada unsubscribed.

@phuedx Was the exclusion of is_bot intentional?

@phuedx Was the exclusion of is_bot intentional?

No. I was transitioning to Anti-Harassment Tools at the time though so take that with a healthy pinch of salt.


Tangentially related: Determining whether the user is a bot should be done by checking whether they have the bot right, i.e.

async function isUserBot() {
    const rights = await mw.user.getRights();

    return rights.indexOf( 'bot' ) !== -1;
}

IIRC the majority of instruments are synchronous. If you'd prefer not to rewrite your synchronous instrument to an asynchronous one, then you can leverage that checking group membership is a synchronous operation and that the bot group a built-in group and grants the bot right to its users:

function isUserBot() {
    const groups = mw.user.getGroups();

    return groups.indexOf( 'bot' ) !== -1;
}
cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.
cjming subscribed.

Change 791422 merged by jenkins-bot:

[schemas/event/secondary@master] Add is-anon, is-bot properties to schema

https://gerrit.wikimedia.org/r/791422

Change 791425 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Populate is-anon, is-bot properties during web a/b test enrollment

https://gerrit.wikimedia.org/r/791425

cjming updated the task description. (Show Details)
cjming updated the task description. (Show Details)
Jdlrobson subscribed.

This can be QAed directly in production.

@Jdlrobson, I'm unable to get this event. Can this be done in the network tab?

hi @Edtadros - not sure how to test this thoroughly in production. I had a workaround for how to test locally with

mw.config.set( 'wgUserGroups', [ '*', 'bot' ] );

injected into the isUserBot() function.

Edtadros subscribed.

Test Result - Prod

Status: ✅ PASS
Environment: frwiki
OS: macOS Monterey
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

✅ AC1: Add field in mediawiki_web_ab_test_enrollment that tracks whether user is anon
✅ AC2: Add field in mediawiki_web_ab_test_enrollment that tracks whether user is bot

Screen Shot 2022-06-02 at 3.13.58 PM.png (271×789 px, 50 KB)

Screen Shot 2022-06-02 at 3.22.25 PM.png (229×769 px, 44 KB)

Screen Shot 2022-06-02 at 3.15.54 PM.png (274×830 px, 51 KB)

Thanks @cjming for your help testing this!