userSessionToken in RelatedArticles schema does not seem to survive beyond one pageview
Closed, ResolvedPublic
Actions

Description

99.99% of sessions only consist of one pageview, see below. But per https://meta.wikimedia.org/wiki/Schema_talk:RelatedArticles , the session token is meant to persist for the entire browser session beyond one pageview. (As also noted there and at T157307#3282646, it can happen that the token gets lost in case local storage is full, but that's unlikely to be the case in 99.99% of views.)

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageId) AS pages
FROM log.RelatedArticles_16352530
WHERE wiki = 'enwiki' and event_skin = 'minerva-stable'
AND LEFT(timestamp, 6) = '201704'
GROUP BY event_userSessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages

pages   sessions
1       5269122
2       93
3       54
4       28
5       23
6       14
7       9
8       8
9       12
10      7
11      2
12      3
13      5
14      1
15      3
17      3
18      3
19      3
20      1
24      2
26      1
30      2
33      1
35      1
39      1
40      1
41      2
45      1
48      1
49      1
52      1
54      1
57      1
59      1
61      1
62      1
63      1
64      2
67      1
68      2
69      1
75      1
80      1
81      1
84      1
152     1
158     1

Sign off checklist

Check the problem has been addressed
Review existing extensions to see if this problem might be impacting other schemas
Update RelatedArticles documentation to reflect $wgRelatedArticlesLoggingSamplingRate -> $wgRelatedArticlesLoggingBucketSize and send an email to devs
For page previews, look at effects of this change on rates of duplication across multiple browsers and document in T167391: [Spike] Isolate the source of Popups event duplication

Details

Subject	Repo	Branch	Lines +/-
Hygiene: SamplingRate -> BucketSize	mediawiki/extensions/RelatedArticles	wmf/1.30.0-wmf.6	+18 -21
i13n: Don't sample by pageview	mediawiki/extensions/RelatedArticles	wmf/1.30.0-wmf.6	+57 -13
i13n: Don't sample by pageview	mediawiki/extensions/RelatedArticles	master	+57 -13
Hygiene: SamplingRate -> BucketSize	mediawiki/extensions/RelatedArticles	master	+18 -21
Allow sampling by user session token	mediawiki/extensions/EventLogging	master	+11 -5
relatedArticles: SamplingRate -> BucketSize	operations/mediawiki-config	master	+13 -2
Event sampling is based on user bucket	mediawiki/extensions/RelatedArticles	master	+3 -1
i13n: Log EL events with mw.track	mediawiki/extensions/Popups	master	+164 -93
Session id should not change on every page view	mediawiki/extensions/RelatedArticles	wmf/1.30.0-wmf.4	+2 -4
Session id should not change on every page view	mediawiki/extensions/RelatedArticles	master	+2 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Dereckson	T68374 Enable Hovercards on se.wikimedia.org (Swedish chapter wiki)
Resolved	Jdlrobson	T70860 [GOAL] Graduate Page Previews feature (Popups extension) out of Beta Feature
Resolved	ovasileva	T154635 [EPIC] Deploy page previews to English and German Wikipedia
Resolved	ovasileva	T162672 Deploy page previews to 90% of users on all wikis but English and German
Resolved	• Tbayer	T158172 Analyze results of stage 0 page previews test
Resolved	• Tbayer	T167236 userSessionToken in RelatedArticles schema does not seem to survive beyond one pageview
Resolved	phuedx	T173422 Investigate the increase in the number of requests to Swift after the Page Previews deploy

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

moving to needs more work

In T167236#3334463, @Jdlrobson wrote:

...

<jdlrobson> Jon Robson ^ HaeB the problem is the sampling of events we record is based on page view token and the sample for the A/B test is based on user token
5:33 PM H<HaeB> you'll see these for all existing and active EL schemas

FWIW, in case of page previews (Schema:Popups), the problem does not seem to have occurred in the same form - we did see session lengths > 1, even at small sampling rates (e.g. an average of 3 daily pageviews per session on ruwiki which had a sampling rate of 1% at the time, a result that is not possible without sessions persisting beyond one pageview). Obviously though this bug still looks very concerning and we will want to check what kind of effect it has in Popups and other schemas, too.

@phuedx the issue is the Schema created in RelatedArticles is configured with a sampling rate. Whether the user in the sample is determined by a page view token in EventLogging. This means a user may log an event on the first page view but not the second.

In RelatedArticles whether the feature is enabled or disabled is based on user token as we want to ensure that once bucketed the user continues to see related pages.

Thus there is a mismatch here.

Change 358046 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/EventLogging@master] Allow sampling by user session token

https://gerrit.wikimedia.org/r/358046

In T167236#3335686, @Jdlrobson wrote:

@phuedx the issue is the Schema created in RelatedArticles is configured with a sampling rate. Whether the user in the sample is determined by a page view token in EventLogging.

I missed that we use mw.eventLogging.Schema in RA. I saw mw.track and assumed we were relying on the subscriber protocol, not the mw.eventLog.inSample method provided by it.

However, I still see the behavior described in T167236#3334989 with $wgRelatedArticlesLoggingSamplingRate = 1 in my local environment. I'll file it as a separate bug.

Jdlrobson moved this task from Needs More Work to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.Jun 9 2017, 6:15 PM

Change 358069 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/RelatedArticles@master] Event sampling is based on user bucket

https://gerrit.wikimedia.org/r/358069

My suggestion solution is to allow EventLogging to do its sampling by user or session token (https://gerrit.wikimedia.org/r/358046
RelatedArticles will then request sampling is done by user token (https://gerrit.wikimedia.org/r/358069)

Sound good..?

Jdlrobson moved this task from Needs Code Review to Blocked on Others on the Readers-Web-Kanbanana-Board-Old board.Jun 9 2017, 9:07 PM

In T167236#3336104, @Jdlrobson wrote:

Sound good..?

IIRC the mw.eventLog.Schema class was introduced by us (the Reading Web team) for us when we had schemas which were relatively simpler and could be sampled per call to #log. This is no longer the case. Moreover, our simplest instrumentation is ReadingDepth, which we've implemented using mw.track/the EL subscriber protocol.

Should we revisit whether the class is necessary?

• bmansurov subscribed.Jun 13 2017, 12:26 PM

@phuedx I agree, we should revisit the decision, especially, when MobileFrontend EventLogging code has largely been removed (that's where we needed mw.eventLog.Schema class).

FYI I'm working on a change to stop using the mw.eventLog.Schema in Page Previews.

Change 358957 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@master] i13n: Log EL events with mw.track

https://gerrit.wikimedia.org/r/358957

IIRC the mw.eventLog.Schema class was introduced by us (the Reading Web team) for us when we had schemas which were relatively simpler and could be sampled per call to #log

T117140 is where added this. It's nice in that it abstracts away the thinking for sampling.

FYI I'm working on a change to stop using the mw.eventLog.Schema in Page Previews.

The approach to remove it is fine although I will grow a little nervous about the amount of duplication I am now seeing across our extensions with regards to identifying buckets. We should also take ownership for deciding the fate of mw.eventLog.Schema as part of this task since we introduced it.

Jdlrobson updated the task description. (Show Details)Jun 14 2017, 5:08 PM

Jdlrobson moved this task from Needs Code Review to Needs More Work on the Readers-Web-Kanbanana-Board-Old board.Jun 14 2017, 8:05 PM

We can take a look at RelatedArticles together…

Whoops!

Krinkle unsubscribed.Jun 15 2017, 2:12 PM

phuedx moved this task from Needs More Work to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.Jun 15 2017, 3:08 PM

The changes to the PP codebase are available for review again. RelatedArticles still needs to be tackled.

Let's sync up in standup today as I'm a little confused with who's doing what now..

After syncing up, @Jdlrobson and I agreed that I'd submit the change for RelatedArticles and @Jdlrobson will lovingly review it.

phuedx moved this task from Needs Code Review to Doing on the Readers-Web-Kanbanana-Board-Old board.Jun 15 2017, 5:34 PM

Change 359921 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@master] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/359921

Change 360165 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@master] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360165

Change 360166 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[operations/mediawiki-config@master] relatedArticles: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360166

phuedx moved this task from Doing to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.Jun 20 2017, 4:27 AM

Change 358957 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] i13n: Log EL events with mw.track

https://gerrit.wikimedia.org/r/358957

• Jhernandez updated the task description. (Show Details)Jun 20 2017, 10:11 AM

• Jhernandez mentioned this in T168380: Explore an API for logging events sampled by session.Jun 20 2017, 10:37 AM

I've reviewed but not merged to get @Jdlrobson to have a look at it. LGTM.

In T167236#3348614, @Jdlrobson wrote:

[...]
The approach to remove it is fine although I will grow a little nervous about the amount of duplication I am now seeing across our extensions with regards to identifying buckets.

We are using mw.experiments and a navigator.sendBeacon check so we are using core facilities and a couple of function calls, so there isn't a lot of code duplication.

There's definitely room to think about a unified method to bucket and log by session, which seems like it is spreading around. See below

We should also take ownership for deciding the fate of mw.eventLog.Schema as part of this task since we introduced it.

mw.eventLog.Schema serves a purpose and is used for now, so that seems unrelated to this task. That and finding a common API to log by session, I don't think those should be part of this task. I've created T168380: Explore an API for logging events sampled by session to follow up, and once we have something we can followup on migrating/upgrading/removing Schema and its uses.

So I think that by now the situation is understood and mostly resolved, but I still ran an analogous query for the ReadingDepth schema as we had planned earlier (also to record the expected distribution on mobile web in general). No surprises there, that schema looks OK for now.

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageToken) AS pages
FROM log.ReadingDepth_16325045
WHERE wiki = 'enwiki' AND event_skin = 'minerva' 
AND LEFT(timestamp, 6) = '201705'
GROUP BY event_SessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages;

+-------+----------+
| pages | sessions |
+-------+----------+
|     1 |   353952 |
|     2 |    32339 |
|     3 |    10509 |
|     4 |     4180 |
|     5 |     2333 |
|     6 |     1308 |
|     7 |      947 |
|     8 |      651 |
|     9 |      470 |
|    10 |      333 |
|    11 |      265 |
|    12 |      218 |
|    13 |      178 |
|    14 |      126 |
|    15 |      120 |
|    16 |       85 |
|    17 |       69 |
|    18 |       69 |
|    19 |       53 |
|    20 |       56 |
|    21 |       47 |
|    22 |       37 |
|    23 |       41 |
|    24 |       40 |
|    25 |       21 |
|    26 |       33 |
|    27 |       25 |
|    28 |       28 |
|    29 |       21 |
|    30 |       12 |
|    31 |       14 |
|    32 |       19 |
|    33 |       14 |
|    34 |       13 |
|    35 |        9 |
|    36 |        5 |
|    37 |        9 |
|    38 |       13 |
|    39 |       12 |
|    40 |        8 |
|    41 |        4 |
|    42 |        7 |
|    43 |        6 |
|    44 |        3 |
|    45 |        5 |
|    46 |        6 |
|    47 |        7 |
|    48 |        8 |
|    49 |        2 |
|    50 |        4 |
|    51 |        4 |
|    52 |        4 |
|    53 |        5 |
|    54 |        4 |
|    55 |        2 |
|    56 |        2 |
|    57 |        2 |
|    58 |        4 |
|    59 |        2 |
|    60 |        2 |
|    61 |        1 |
|    62 |        2 |
|    63 |        3 |
|    64 |        3 |
|    65 |        1 |
|    66 |        2 |
|    67 |        1 |
|    68 |        4 |
|    69 |        1 |
|    70 |        1 |
|    72 |        2 |
|    73 |        1 |
|    75 |        2 |
|    77 |        2 |
|    78 |        1 |
|    79 |        1 |
|    81 |        3 |
|    84 |        1 |
|    86 |        1 |
|    87 |        1 |
|    88 |        1 |
|    91 |        1 |
|    92 |        1 |
|    95 |        1 |
|    96 |        1 |
|    97 |        1 |
|    98 |        1 |
|   101 |        1 |
|   102 |        2 |
|   107 |        1 |
|   112 |        1 |
|   113 |        1 |
|   119 |        2 |
|   121 |        1 |
|   123 |        1 |
|   125 |        2 |
|   126 |        1 |
|   138 |        1 |
|   139 |        1 |
|   140 |        2 |
|   154 |        1 |
|   168 |        1 |
|   188 |        1 |
|   215 |        1 |
+-------+----------+
104 rows in set (24.43 sec)

Change 358069 abandoned by Jdlrobson:
Event sampling is based on user bucket

https://gerrit.wikimedia.org/r/358069

Change 359921 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@master] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/359921

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)); removed MW-1.30-release-notes (WMF-deploy-2017-06-06_(1.30.0-wmf.4)).Jun 20 2017, 10:00 PM

For the record, we earlier had a very similar problem with the MobileWebSectionUsage schema (as I may already have mentioned this in a recent discussion). See the May 2016 discussion at T128931#2283390 (and following comments, by myself, @dr0ptp4kt and @Jdlrobson ).

I merged i13n: Don't sample by pageview

Using ext.relatedArticles.instrumentationguarantees that events are logged randomly based on user session token for this particular feature
Using ext.relatedArticles.visibility guarantees the visibility option sticks across a user session

Regarding the follow up Hygiene: SamplingRate -> BucketSize when speaking to @Nuria, she pointed out we were not doing a true A/B test. We have a control which will be enabled for 98% of users and the other 2% will not see the feature. Of course we can normalise the 2%, but I wonder if the right thing is to end up with 3 groups and now is the best opportunity to do that?

ie.

We have 3 test groups - A, B and control. Control and A will get the related articles feature, B will not.
Only log events for A and B
Rename/repurpose the global config variables to account for this

Thoughts? Or do we consider this out of scope?
(If so Joaquin you should feel free to +2 in my absence)

I don't have any particular opinions on the design of the tracking and the test, but just wanted to point out that Hygiene: SamplingRate -> BucketSize is just a name change for the variables, from sampling to bucketing, given that the they are used for bucketing with mw.experiments.

Related articles now makes use of 2 different independent systems of bucketing:

One based of wgRelatedArticlesLoggingSamplingRate for bucketing a user into a log or do not log bucket (experiment name ext.relatedArticles.instrumentation)
- Currently set to bucket 0.01 of user sessions into the log bucket
The other one based of wgRelatedArticlesEnabledSamplingRate for determining if the user will see related articles enabled or disabled (experiment name ext.relatedArticles.visibility)
- Currently set to 1 by default (all sessions see related articles), and set to 0.98 on enwiki (98% of sessions see related articles enabled)

Again, these are independent. If we want to change the design of the experiment and tie it to a different logging strategy, we an do that, but it seems like a different task.

Sure but the name change also requires the hassle of updating production config so it seemed like a good time to pause and say is this what we really want.

In T167236#3367564, @Jdlrobson wrote:

Sure but the name change also requires the hassle of updating production config so it seemed like a good time to pause and say is this what we really want.

Thanks for saying this and asking for a pause.

I would say that I agree with @Jhernandez that this is a name change only that's inline with mw.experiments terminology – but then, I would as I wrote the initial change. With that in mind, I've added a backwards compatibility layer to CommonSettings.php in https://gerrit.wikimedia.org/r/#/c/360166/2/wmf-config/CommonSettings.php.

How do we move forward with this?

We'd need to swat the config change and then merge the change in Page previews shortly after.

@Nuria @ovasileva and @Tbayer please speak now if you are unhappy with having a 98% control group. I expect we'll SWAT this and wrap up the work sometime tomorrow so you have a limited window.

@Jdlrobson Since it is @Tbayer who is going to analyze the data I think he shoudl evaluate that. Let's just not call this test an AB test or the 98% a "control" group since an AB test implies buckets of equal size for control and experiment. There can be valid reasons why you might want to launch and track a feature just for 2% of your users and that seems what we are doing on this case. I have not looked at code in detail though.

@Jdlrobson - 98% works! What is the sampling rate?

What we would ideally want is something like this:

98% have related articles ON and are not sampled (i.e. are not sending events)
1% have related articles ON and are sampled (are sending events)
1% have related articles OFF and are sampled

Obviously this depends on the sampling method being compatible with the bucketing method. If they are instead independent, i.e. if first bucket into (say) 98% related articles ON and 2% OFF, and then sample 1% of each group, we would need to double-check that we still get enough data for the OFF group.

Also, it looks like there continues to be a bit of confusion about terminology. Some general notes:

As @Nuria said, the control group is not the right term for the part of the population that one leaves out of the sample (does not record data for). Rather, the control group is the part of the sample where conditions are left the same (here, where we leave related articles active).
It's totally fine for an A/B test not to sample (record data for) the entire population - in the end it's just about drawing two samples and comparing the chosen metric.
It's also OK for A and B (i.e. the test and control groups) to have different sizes - it's just a bit unusual. (Normally the accuracy of the test will be determined by the size of smaller of the two, so one would basically be throwing away data, which is why most people don't do this. That said, there are other motivations for keeping one group larger than the other, which are not really applicable for us here, but are part of the rationale for multi-armed bandit testing methods.)

ovasileva updated the task description. (Show Details)Jun 22 2017, 4:13 PM

https://gerrit.wikimedia.org/r/360166 is on SWAT calendar today for 11am PDT.

Change 360166 merged by jenkins-bot:
[operations/mediawiki-config@master] relatedArticles: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360166

Jdlrobson moved this task from Needs Code Review to Ready for Signoff on the Readers-Web-Kanbanana-Board-Old board.Jun 22 2017, 6:34 PM

Change 360165 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@master] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360165

Sign off blocked on T167310 being completed, but when that is, Tilman would be best placed to help sign this off (check the number of sessions looks sensible)

• bmansurov mentioned this in T167391: [Spike] Isolate the source of Popups event duplication.Jun 22 2017, 6:53 PM

Change 358046 abandoned by Jdlrobson:
Allow sampling by user session token

https://gerrit.wikimedia.org/r/358046

Jdlrobson removed a project: Patch-For-Review.Jun 22 2017, 10:22 PM

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way. Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

ping @Jdlrobson, @phuedx:

In T167236#3375289, @Tbayer wrote:

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way. Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

Not being familiar with the feature this is what I see:

I think related articles is enabled for all wikipedia here: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17285
and that whether you are showed more articles is driven by isEnabledForCurrentUser function here: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.bootstrap/index.js#L73 which has two buckets on unequal size: 1% and 99%, Thus this code "disables related articles for 1% of population" Terminology is confusing cause it talks about buckets and control when in this case (if I am understanding the code right) is neither, we are just enabling a feature for 99% of devices.

This other file uses a sample rate of 1% to log "related articles" events: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.eventLogging/index.js#L28
What I do not understand is how this second file is called only for the 99% of devices for which the related pages are enabled (might be missing something obvious) and not the whole device population.

Side note: will comment on gerrit patch about variable naming.

In T167236#3375289, @Tbayer wrote:

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

The talk page is accurate. I've updated the description of T167310 (T167310#3378461). If you do think that a sampling rate of 0.01% would be more suitable, then we (Reading Web) can change it relatively easily.

In T167236#3375289, @Tbayer wrote:

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team.

T167310: Re-launch related pages a/b test on mobile enwiki was wrong on the description. @phuedx has updated the description of that task to reflect the truth: T167310#3378461

I've also updated https://meta.wikimedia.org/wiki/Schema_talk:RelatedArticles with a link to the source of truth to get the sampling rate number (1 based, right now at 0.01).

I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

If the logging sampling rate has to change, it needs to be a new card please.

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way.

We use the same user session token for the two, but internally they are hashed with the experiment name, which is different for bucketing and logging sampling, so they are independent.

See https://doc.wikimedia.org/mediawiki-core/master/js/source/mediawiki.experiments.html#mw-experiments-method-getBucket

			hash = hashString( experiment.name + ':' + token );

Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

We use mw.experiments with two buckets to get a weighted boolean. https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=%22mw.experiments.getBucket%22

For enabled/disabled, https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L31-L38

Which ends up being (in enwiki), control = 0.02, A = 0.98 (https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17314) (it is control = 0, A = 1 everywhere else)

Independently, logging https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.eventLogging/index.js#L34

Which ends up being (everywhere), control = 0.99, A = 0.01. See https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L2894

Not sure if this info fits the definition of formulas but I hope it helps.

In T167236#3375645, @Nuria wrote:

Not being familiar with the feature this is what I see:

I think related articles is enabled for all wikipedia here: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17285
and that whether you are showed more articles is driven by isEnabledForCurrentUser function here: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.bootstrap/index.js#L73 which has two buckets on unequal size: 1% and 99%, Thus this code "disables related articles for 1% of population" Terminology is confusing cause it talks about buckets and control when in this case (if I am understanding the code right) is neither, we are just enabling a feature for 99% of devices.

isEnabledForCurrentUser uses wgRelatedArticlesEnabledBucketSize which is 1 by default (enabled for all), but 0.98 for enwiki (https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17314), so 2% disabled and 98% enabled.

The code talks about control and buckets, because we are using mediawiki core's mediawiki.experiments library which forces you to have a bucket called control (see source), and it performs selection based on buckets.

So in the source, we use that terminology, on the outside, they are sampling rates per session, as you mention in the code review. It is understandable that it is confusing coming at it the first time.

This other file uses a sample rate of 1% to log "related articles" events: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.eventLogging/index.js#L28
What I do not understand is how this second file is called only for the 99% of devices for which the related pages are enabled (might be missing something obvious) and not the whole device population.

I believe it loads all the time if the related articles feature loads (there are skin whitelists) so it the logging code would load for the entire device population as you say, because there are events we log independently of if the user has been bucketed to see related articles.

The [logEnabled](https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L91) event is logged for both feature enabled and disabled buckets to eventlogging, maybe to sanity check the percentages against the defined sampling rates. @Tbayer should know more about why we were asked to log this.

Thanks, @Jhernandez for the explanation of the bucketing algorithm provided by [mediaWiki.experiments](https://github.com/wikimedia/mediawiki/blob/85a48d8f/resources/src/mediawiki/mediawiki.experiments.js).

To add a little more detail: the mw.experiments.getBucket function takes an experiment name, a set of bucket-weight pairs, and some token and returns the name of a bucket. It will always return the same value given the same arguments. Moreover, the function doesn't have side effects: it doesn't store the token for the developer.

As @Jhernandez said in T167236#3378564, getBucket:

Hashes the experiment name and the token to get a number between 0 and 1 (H)
Sums the weights of the buckets (R)
Finds the bucket corresponding to H * R, i.e. it iterates over the buckets, summing the weights until the sum is greater than or equal to the hash.

A couple of notes:

Iterating over the properties of an object is undefined in the JavaScript specification but all modern browsers iterate over them in the order in which they were defined.
getBucket uses the Jenkins one-at-a-time hash function internally but should probably be moved to FNV-1a (32-bit) as it's arguably better for uniqueness and speed (as well as already implemented within the MediaWiki codebase).

In T167236#3378631, @Jhernandez wrote:

(Thanks for the informative explanations! Will finish reading them later.)

The [logEnabled](https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L91) event is logged for both feature enabled and disabled buckets to eventlogging, maybe to sanity check the percentages against the defined sampling rates. @Tbayer should know more about why we were asked to log this.

Hold on - please keep in mind that I basically have not been involved with this schema until last month, so this kind of question is perhaps best addressed to the people who were involved with T114303 back in 2015 and/or these later (February 2017) changes ;)

I'm not seeing an "logEnabled" event in the schema . Does this refer to the event names feature-enabled and feature-disabled?

I believe it loads all the time if the related articles feature loads (there are skin whitelists) so in the logging code would load for the entire device population as you say, because there are events we log independently of if the user has been bucketed to see >related articles.

Ok, so if i understand this correctly I think is worth thinking whether this experiment is set up how @Tbayer
asked earlier.

His request:
<snipet>
98% have related articles ON and are not sampled (i.e. are not sending events)
1% have related articles ON and are sampled (are sending events)
1% have related articles OFF and are sampled
</snipet>

What *I think* is on the actual code:

99% have related articles ON
1% have related articles OFF

of the 100% population (regardless of whether related articles is ON or OFF) 1% are logging events

To clarify, what I wrote in T167236#3371031 is that we would have such a setup (98+1+1, which would obviously require different sampling rates for the two buckets) ) "ideally". I am aware that we currently applying the same sampling rate to both buckets and hence end up with different sample sizes for group A and group B. As mentioned back then, that's not a dealbreaker - just a bit awkward ;)

In T167236#3380249, @Tbayer wrote:

Hold on - please keep in mind that I basically have not been involved with this schema until last month, so this kind of question is perhaps best addressed to the people who were involved with T114303 back in 2015 and/or these later (February 2017) changes ;)

Sorry, I didn't realize.

I'm not seeing an "logEnabled" event in the schema . Does this refer to the event names feature-enabled and feature-disabled?

Indeed, sorry for the confusion (source)

In T167236#3380565, @Tbayer wrote:

To clarify, what I wrote in T167236#3371031 is that we would have such a setup (98+1+1, which would obviously require different sampling rates for the two buckets) ) "ideally". I am aware that we currently applying the same sampling rate to both buckets and hence end up with different sample sizes for group A and group B. As mentioned back then, that's not a dealbreaker - just a bit awkward ;)

^ all that. If it becomes a deal-breaker then we'll create another task to change the behavior.

phuedx mentioned this in T167310: Re-launch related pages a/b test on mobile enwiki.Jun 27 2017, 1:26 PM

@Nuria does the comment I posted earlier: https://phabricator.wikimedia.org/T167236#3365706 match your understanding?

Change 361715 had a related patch set uploaded (by Jdlrobson; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/361715

gerritbot added a project: Patch-For-Review.Jun 27 2017, 6:33 PM

Change 361716 had a related patch set uploaded (by Jdlrobson; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/361716

@Jdlrobson: I have looked at experiment setup (see my comment above) and seems that @Tbayer is aware of trade offs. I have not looked at code of feature in detail.

Some corrections as to how buckets are calculated (not sure these corrections matter that much, really, any issues you see will likely come from the instrumentation, not the bucketing code which is fine as noted earlier by @phuedx ). Code here: https://github.com/wikimedia/mediawiki/blob/85a48d8f/resources/src/mediawiki/mediawiki.experiments.js#L97-L105

Hashes the experiment name and the token to get a number that is less than 2^32 (H)

Sums the weights of the buckets to get the range (this number will be 1 if buckets comprise the totality of population). (R)

Normalizes H * R by dividing by 2^32 and getting a number that is thus less than 1 (B)

Loops over buckets accumulating weights and returns 1st bucket for which B < accumulator

As @phuedx pointed out this algorithm is deterministic, and hash function seems to be uniform enough (which is different from crypto-safe) . I do not see any issues with it, nor the need to migrate it. The hashing needs to work for a set of, say, 500 million (per month) session ids and produce uniform-enough results. Poor results would be collisions of the hash function. In the hash function implementation it notes that you might have 1 collision per 2^32 space. That seems fine.

Change 361715 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/361715

Change 361716 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/361716

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-06-20_(1.30.0-wmf.6)); removed MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)).Jun 28 2017, 12:00 AM

I've re-run the query for July:

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageId) AS pages
FROM log.RelatedArticles_16352530
WHERE wiki = 'enwiki' and event_skin = 'minerva-stable'
AND LEFT(timestamp, 6) = '201707'
GROUP BY event_userSessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages

This is what I see:

This looks much more healthy...

• Tbayer added a parent task: T158172: Analyze results of stage 0 page previews test.Jul 15 2017, 1:16 AM

• Tbayer mentioned this in T158172: Analyze results of stage 0 page previews test.

• Tbayer mentioned this in T169948: Stop RelatedArticles A/B test and clean up config.Aug 21 2017, 5:23 PM

• chelsyx mentioned this in T174396: Investigate mobile search dashboard data .Sep 26 2017, 10:05 PM

MBinder_WMF moved this task from 2016-17 Q4 to 2017-18 Q1 on the Web-Team-Backlog board.Mar 6 2018, 6:51 PM