Page MenuHomePhabricator

Analyze table of contents A/B test
Closed, ResolvedPublic

Assigned To
Authored By
ovasileva
Jun 1 2022, 7:36 AM
Referenced Files
F35309918: image.png
Jul 6 2022, 6:17 PM
F35309916: image.png
Jul 6 2022, 6:17 PM
F35295030: image.png
Jul 1 2022, 11:41 PM
F35295025: image.png
Jul 1 2022, 11:41 PM
F35294971: image.png
Jul 1 2022, 11:41 PM
F35294954: image.png
Jul 1 2022, 11:41 PM

Description

Background

The goal of the new table of contents is to make ToC navigation usually found at the top of the page more easily accessible from everywhere within the page, as well as to provide context on the the article throughout the reading experience, from the location where a reader is reading to larger context on the topic and individual sections.

We would like to measure the effects of introducing the change and the success of the goals stated above quantitatively.

Questions

Is the new table of contents is used more frequently than the previous table of contents
Does the new table of contents reduce the need to scroll back to the top of the page
Does the new table of contents decrease the time people spend scrolling/scrolling quickly (if possible)
How does the new table of contents affect the time spent on a page

Event Timeline

ovasileva renamed this task from Analyze table of contents A/B testt to Analyze table of contents A/B test.Jun 1 2022, 7:36 AM
ovasileva triaged this task as High priority.
ovasileva moved this task from Incoming to Analyst Consultation on the Web-Team-Backlog board.
Q1 Is the new table of contents is used more frequently than the previous table of contents?

Summary
The hypothesis is that the new table of contents is used more frequently than the previous table of contents. We measured the metrics of total clicks and click rate. The hypothesis is NOT supported by data. Need to discuss with PM.

The clicks on new ToC is much fewer than on old ToC for both logged-in users and anonymous users.

  1. Total number of clicks on ToC from logged-in users

image.png (986×1 px, 117 KB)

  1. Total number of clicks on ToC from anonymous users

image.png (972×1 px, 121 KB)

The click rate on new ToC is much lower than on old ToC for both logged-in users and anonymous users. The click rate is defined as the total number of clicks out of total number of pageviews.

  1. The click rate on ToC from logged-in users

image.png (962×1 px, 134 KB)

  1. The click rate on ToC from anonymous users

image.png (970×1 px, 153 KB)

@jwang could you please share the queries you are using to generate these graphs? Thanks in advance!

@ovasileva @Jdlrobson

Here is the draft of the analysis report: http://nbviewer.org/github/jenniferwang-wmf/WEB_table_of_contents/blob/master/analysis_1_AB_test.ipynb

If you click the toggle button on the top, you will see the raw code. Please review and let me know if you have any questions.

There is an issue with the query being used here for measuring clicks.

  • Clicks to old table of contents are tracked by the ui.toc event.
  • Clicks to new table of contents are tracked by the ui.sidebar-toc event.

So anywhere you have AND event.name = 'ui.toc' that should be AND (event.name = 'ui.toc' OR event.name = 'ui.sidebar-toc'). Hope that explains the click discrepancy.

It's great we find a potential reason. I will rerun the analysis for clicks on ToC. I did not see event.name = 'ui.sidebar-toc' is mentioned in func spec. Can you point me to any document or communication about it?

We had excluded the sessions which were assigned to both control group and treatment group in analysis. Why we still see events with event.name = 'ui.toc' in treatment group? For example, it's 50k events with event.name = 'ui.toc' in treatment group on frwiki, 9k on ptwiki. Does it mean these sessions, which are only assigned into treatment group in mediawiki_web_ab_test_enrollment schema, still see the old ToC?

Why we still see events with event.name = 'ui.toc' in treatment group?

Sorry I forgot to mention one caveat... we show the old table of contents in the treatment bucket if the window is small (<1000px). Usage at smaller windows should be low, as it requires resizing the window or loading the site on a mobile device.

Because of this, we should be considering the different viewportSizeBucket buckets when modeling behavior for each bucket. I would expect to see no difference for example in the below 320px bucket in the treatment and control buckets as these users are getting identical experiences.

Thank you for the info. Then in the AB test analysis the windows smaller than 1000px should be excluded from both control and treatment group. Will add dimension of viewportSize in analysis for observation.

Could you also review the queries for scrolls and reading depth? @Jdlrobson

I think this is the major reason that treatment has such a low clicks on ToC. Data shows in treatment group, some of sessions saw the old ToC, and some of sessions saw the new ToC, no matter the viewport size. Here are the examples of frwiki from logged-in users and anonymous users. In treatment group, more than half of the sessions with larger than 1000px viewport are assigned to old ToC.

  1. number of unique sessions which have clicks on ToC from logged-in users on frwiki

image.png (822×908 px, 141 KB)

  1. number of unique sessions which have clicks on ToC from anonymous users on frwiki

image.png (1×896 px, 176 KB)

query to get the data

WITH t_ab_no_dupli AS (
SELECT  web_session_id, wiki, meta.domain AS domain, count(distinct `group` ) AS groups,  min(meta.dt) AS session_dt 
FROM event.mediawiki_web_ab_test_enrollment
WHERE wiki NOT IN ('testwiki','test2wiki')  AND year=2022 AND month IN (5,6) 
AND  experiment_name='skin-vector-toc-experiment'
GROUP BY  web_session_id, wiki, meta.domain
-- exclude session ids are in both control and treatment group
HAVING groups < 2
),
t_ab AS(
SELECT 
 t1.web_session_id,
 t1.wiki,t1.meta.domain AS domain,
 t1.`group` AS test_group,
 min(t1.meta.dt) AS session_dt 
FROM event.mediawiki_web_ab_test_enrollment AS t1
INNER JOIN  t_ab_no_dupli AS t2 ON t1.wiki=t2.wiki 
AND t1.web_session_id=t2.web_session_id 
WHERE t1.wiki NOT IN ('testwiki','test2wiki')  
AND year=2022 
AND CONCAT(year, '-', LPAD(month,2,'0'),'-', LPAD(day,2,'0')) BETWEEN '2022-05-26' AND '2022-06-15'
AND  experiment_name='skin-vector-toc-experiment'
GROUP BY  t1.web_session_id, t1.wiki,t1.meta.domain, t1.`group`
)
-- clicks from ab test group
SELECT t3.event.token AS session_id, 
wiki, event.isanon,  t4.test_group,
event.name AS event_name,  event.viewportSizeBucket AS view_size,
count(1) AS clicks
FROM event.DesktopWebUIActionsTracking AS t3
INNER JOIN t_ab AS t4 
ON  t3.wiki=t4.wiki AND t3.event.token = t4.web_session_id
WHERE t3.wiki IN ('bnwiki', 'fawiki', 'foundationwiki',
'hewiki', 'ptwikinews', 'ptwikiversity', 'srwiki',
'thwiki', 'vecwiki', 'viwiki', 'viwikibooks', 'dewikivoyage',
'euwiki', 'kowiki', 'plwikinews', 'trwiki', 'arywiki',
'frwiki', 'frwikiquote', 'frwiktionary', 'incubatorwiki', 'ptwiki'
) 
AND t3.year=2022 and t3.month IN (5,6)
AND CONCAT(t3.year, '-', LPAD(t3.month,2,'0'),'-', LPAD(t3.day,2,'0')) BETWEEN '2022-05-26' AND '2022-06-15'
AND t4.session_dt <= t3.meta.dt
AND event.name IN ( 'ui.toc', 'ui.sidebar-toc')
AND event.action='click' AND event.skinversion=2 
GROUP BY t3.event.token, t3.wiki, event.isanon, t4.test_group, event.name,  event.viewportSizeBucket

With such a test group assignment, the AB test analysis on scrolls to ToC and reading time is invalid. Because schema mediawiki_web_ab_test_enrollment, mediawiki_reading_depth and mediawiki_web_ui_scroll do not record viewportsize. The treatment group and control group can not be correctly categorized.

Hi @jwang I'm out of office until Thursday PM, but I think the problem here perhaps relates to how bucketing works - bucketing is per page not per user ID. Since the number of sessions for those users is very high, the likelihood of them being bucketed in both experiences over a period of time is significantly greater. One possible way to limit the analysis is to only look at users with less than a certain amount of sessions. I believe @bwang is taking a closer look at your analysis while I'm out.

Thanks for the reply while you are out. @Jdlrobson .

More info for @bwang and other engineers who might look into it.

In the query, the sessions in both control group and treatment group were excluded based on schema mediawiki_web_ab_test_enrollment. And user_id was not recorded in schema mediawiki_web_ab_test_enrollment.

This comment was removed by Jdlrobson.

Sorry that this A/B test analysis is proving so troublesome.

In the query, the sessions in both control group and treatment group were excluded based on schema mediawiki_web_ab_test_enrollment

Hm.. are you checking the timestamps of the rows in mediawiki_web_ab_test_enrollment match up with those in desktopwebuiactionstracking ? It's possible that the event in desktopwebuiactionstracking has no corresponding A/B test enrollment event given desktopwebuiactionstracking is sampled based on user session ID and ABTest enrollment is being sampled on page ID. We may need to filter based on the hour or even minute to get more precision here. Could you include the timestamps for when events for both ui.sidebar-toc and ui.toc appeared?

With such a test group assignment, the AB test analysis on scrolls to ToC and reading time is invalid. Because schema mediawiki_web_ab_test_enrollment, mediawiki_reading_depth and mediawiki_web_ui_scroll do not record viewportsize. The treatment group and control group can not be correctly categorized.

It is correct that the treatment and control group for scrolls and reading time do not have the viewport size. Two possible options here:

  1. you could use the user session IDs in the click tracking schema init event to limit analysis. e.g. something like {A}
  2. Note the analysis as flawed. We can use event.desktopwebuiactionstracking to get a sense of impact e.g. what % of users were viewing on narrow screens. When I query a single day {B} it looks like a small fraction of events fall into this category.

{A}

SELECT `event.token`
FROM event.desktopwebuiactionstracking
WHERE  year =2022 and month = 6 and day = 30
and event.action = 'init' and event.viewportSizeBucket = '1200px-2000px'
limit 1;

{B}

SELECT event.viewportSizeBucket, count(*)
FROM event.desktopwebuiactionstracking
WHERE  year = 2022 and month = 6 and day = 5
and event.action = 'init' group by event.viewportSizeBucket;

Because the sessions no matter in which viewportSizeBucket bucket are partially distributed to old ToC. viewportSizeBucket is not helpful to identify which version of ToC the session is distributed to.

I explored the timestamp, to see if we can filter sessions seeing old ToC in treatment group based on timestamp, so that we still can analyze the scrolls and reading time. Following example shows the session switched between new ToC and old ToC within a very short window ( < 1 min). It seems we cannot identify the version of ToC the session was exposed to based on the timestamp.

All events of session '002708c8787bcbcce17c' from mediawiki_web_ab_test_enrollment

dtgroup
2022-05-26T11:36:21.543Ztreatment
2022-05-26T11:36:53.112Ztreatment
2022-05-26T11:46:15.355Ztreatment
2022-05-26T11:47:15.994Ztreatment
2022-05-26T11:47:42.007Ztreatment

All events of session '002708c8787bcbcce17c' from desktopwebuiactionstracking

dtnameactionviewportsizebucket
2022-05-26T11:34:31.304Zui.tocclick1200px-2000px
2022-05-26T11:34:31.304ZNULLinit1200px-2000px
2022-05-26T11:36:53.112ZNULLinit1200px-2000px
2022-05-26T11:37:52.209Zui.sidebar-tocclick1200px-2000px
2022-05-26T11:38:39.408ZNULLinit1200px-2000px
2022-05-26T11:38:39.411Zui.tocclick1200px-2000px
2022-05-26T11:46:15.346ZNULLinit1200px-2000px
2022-05-26T11:47:15.994ZNULLinit1200px-2000px

queries to get the data

SELECT meta.dt, "group"
FROM event.mediawiki_web_ab_test_enrollment
WHERE  year = 2022 and month in (5, 6 )
and web_session_id='002708c8787bcbcce17c'
ORDER BY meta.dt
LIMIT 100000
SELECT dt, event.name, event.action, event.viewportSizeBucket
FROM event.desktopwebuiactionstracking 
WHERE year = 2022 and month in (5, 6 ) and event.action IN ('init' ,'click')
and event.token='002708c8787bcbcce17c' 
ORDER BY dt 
LIMIT 100000

I have revised the draft based on the discussion and further looked into clicks on ToC after cleaning up data. (analysis codebase)

Summary
Is the new table of contents is used more frequently than the previous table of contents?

  • Data: The number of clicks on ToC per session page view from the sessions which have at least 1 click on ToC.
  • Method: We fit a generalized linear mixed model (GLMM) to infer the impact of the version of ToC on the number of clicks on ToC and confirm any statistical difference between control group and treatment group.
  • Results:
    • Among the sessions with at least 1 click on ToC, the treatment group has more clicks on ToC than the control group. Data model predicts 53% more clicks on new ToC with logged-in users and 45.5% more clicks on new ToC with anonymous users. The trend is consistent among all edit count buckets in logged-in users.
    • Above estimated growth is only limited to the users who have clicked. We cannot confirm whether the new ToC has a higher adoption rate because we don't have correct data for users who see the new ToC with 0 click. Above prediction is made based on an assumption that users' click behavior is not impacted by the previous ToC version they saw.
    • The low R squared value indicates the model does not include all factors which influenced the number of clicks. One of possible factors is the length of pages. The longer pages possibly result in more clicks on ToC. However, We don't have page related info in data to confirm.

      Analysis report of clicks on ToC

Does the new table of contents reduce the need to scroll back to the top of the page
Due the the bucketing issue (T309682#8057200), we do not have clean data to answer this question.
Report

How does the new table of contents affect the time spent on a page
Due the the bucketing issue(T309682#8057200), we do not have clean data to answer this question.
Report

Does the new table of contents decrease the time people spend scrolling/scrolling quickly (if possible)
We do not have data to answer this question as we do not have instrumentation for scrolling speed.

@ovasileva , let me know if you have any question.

Marked it as resolved. Feel free to reopen if you have any followup questions.