Page MenuHomePhabricator

userSessionToken in RelatedArticles schema does not seem to survive beyond one pageview
Closed, ResolvedPublic

Description

99.99% of sessions only consist of one pageview, see below. But per https://meta.wikimedia.org/wiki/Schema_talk:RelatedArticles , the session token is meant to persist for the entire browser session beyond one pageview. (As also noted there and at T157307#3282646, it can happen that the token gets lost in case local storage is full, but that's unlikely to be the case in 99.99% of views.)

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageId) AS pages
FROM log.RelatedArticles_16352530
WHERE wiki = 'enwiki' and event_skin = 'minerva-stable'
AND LEFT(timestamp, 6) = '201704'
GROUP BY event_userSessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages

pages   sessions
1       5269122
2       93
3       54
4       28
5       23
6       14
7       9
8       8
9       12
10      7
11      2
12      3
13      5
14      1
15      3
17      3
18      3
19      3
20      1
24      2
26      1
30      2
33      1
35      1
39      1
40      1
41      2
45      1
48      1
49      1
52      1
54      1
57      1
59      1
61      1
62      1
63      1
64      2
67      1
68      2
69      1
75      1
80      1
81      1
84      1
152     1
158     1

Sign off checklist

  • Check the problem has been addressed
  • Review existing extensions to see if this problem might be impacting other schemas
  • Update RelatedArticles documentation to reflect $wgRelatedArticlesLoggingSamplingRate -> $wgRelatedArticlesLoggingBucketSize and send an email to devs
  • For page previews, look at effects of this change on rates of duplication across multiple browsers and document in T167391: [Spike] Isolate the source of Popups event duplication

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

...

<jdlrobson> Jon Robson ^ HaeB the problem is the sampling of events we record is based on page view token and the sample for the A/B test is based on user token
5:33 PM H<HaeB> you'll see these for all existing and active EL schemas

FWIW, in case of page previews (Schema:Popups), the problem does not seem to have occurred in the same form - we did see session lengths > 1, even at small sampling rates (e.g. an average of 3 daily pageviews per session on ruwiki which had a sampling rate of 1% at the time, a result that is not possible without sessions persisting beyond one pageview). Obviously though this bug still looks very concerning and we will want to check what kind of effect it has in Popups and other schemas, too.

@phuedx the issue is the Schema created in RelatedArticles is configured with a sampling rate. Whether the user in the sample is determined by a page view token in EventLogging. This means a user may log an event on the first page view but not the second.

In RelatedArticles whether the feature is enabled or disabled is based on user token as we want to ensure that once bucketed the user continues to see related pages.

Thus there is a mismatch here.

Change 358046 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/EventLogging@master] Allow sampling by user session token

https://gerrit.wikimedia.org/r/358046

@phuedx the issue is the Schema created in RelatedArticles is configured with a sampling rate. Whether the user in the sample is determined by a page view token in EventLogging.

I missed that we use mw.eventLogging.Schema in RA. I saw mw.track and assumed we were relying on the subscriber protocol, not the mw.eventLog.inSample method provided by it.

However, I still see the behavior described in T167236#3334989 with $wgRelatedArticlesLoggingSamplingRate = 1 in my local environment. I'll file it as a separate bug.

Change 358069 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/RelatedArticles@master] Event sampling is based on user bucket

https://gerrit.wikimedia.org/r/358069

My suggestion solution is to allow EventLogging to do its sampling by user or session token (https://gerrit.wikimedia.org/r/358046
RelatedArticles will then request sampling is done by user token (https://gerrit.wikimedia.org/r/358069)

Sound good..?

Sound good..?

IIRC the mw.eventLog.Schema class was introduced by us (the Reading Web team) for us when we had schemas which were relatively simpler and could be sampled per call to #log. This is no longer the case. Moreover, our simplest instrumentation is ReadingDepth, which we've implemented using mw.track/the EL subscriber protocol.

Should we revisit whether the class is necessary?

@phuedx I agree, we should revisit the decision, especially, when MobileFrontend EventLogging code has largely been removed (that's where we needed mw.eventLog.Schema class).

FYI I'm working on a change to stop using the mw.eventLog.Schema in Page Previews.

Change 358957 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@master] i13n: Log EL events with mw.track

https://gerrit.wikimedia.org/r/358957

IIRC the mw.eventLog.Schema class was introduced by us (the Reading Web team) for us when we had schemas which were relatively simpler and could be sampled per call to #log

T117140 is where added this. It's nice in that it abstracts away the thinking for sampling.

FYI I'm working on a change to stop using the mw.eventLog.Schema in Page Previews.

The approach to remove it is fine although I will grow a little nervous about the amount of duplication I am now seeing across our extensions with regards to identifying buckets. We should also take ownership for deciding the fate of mw.eventLog.Schema as part of this task since we introduced it.

We can take a look at RelatedArticles together…

The changes to the PP codebase are available for review again. RelatedArticles still needs to be tackled.

Let's sync up in standup today as I'm a little confused with who's doing what now..

After syncing up, @Jdlrobson and I agreed that I'd submit the change for RelatedArticles and @Jdlrobson will lovingly review it.

Change 359921 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@master] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/359921

Change 360165 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@master] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360165

Change 360166 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[operations/mediawiki-config@master] relatedArticles: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360166

Change 358957 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] i13n: Log EL events with mw.track

https://gerrit.wikimedia.org/r/358957

I've reviewed but not merged to get @Jdlrobson to have a look at it. LGTM.

[...]
The approach to remove it is fine although I will grow a little nervous about the amount of duplication I am now seeing across our extensions with regards to identifying buckets.

We are using mw.experiments and a navigator.sendBeacon check so we are using core facilities and a couple of function calls, so there isn't a lot of code duplication.

There's definitely room to think about a unified method to bucket and log by session, which seems like it is spreading around. See below

We should also take ownership for deciding the fate of mw.eventLog.Schema as part of this task since we introduced it.

mw.eventLog.Schema serves a purpose and is used for now, so that seems unrelated to this task. That and finding a common API to log by session, I don't think those should be part of this task. I've created T168380: Explore an API for logging events sampled by session to follow up, and once we have something we can followup on migrating/upgrading/removing Schema and its uses.

So I think that by now the situation is understood and mostly resolved, but I still ran an analogous query for the ReadingDepth schema as we had planned earlier (also to record the expected distribution on mobile web in general). No surprises there, that schema looks OK for now.

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageToken) AS pages
FROM log.ReadingDepth_16325045
WHERE wiki = 'enwiki' AND event_skin = 'minerva' 
AND LEFT(timestamp, 6) = '201705'
GROUP BY event_SessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages;

+-------+----------+
| pages | sessions |
+-------+----------+
|     1 |   353952 |
|     2 |    32339 |
|     3 |    10509 |
|     4 |     4180 |
|     5 |     2333 |
|     6 |     1308 |
|     7 |      947 |
|     8 |      651 |
|     9 |      470 |
|    10 |      333 |
|    11 |      265 |
|    12 |      218 |
|    13 |      178 |
|    14 |      126 |
|    15 |      120 |
|    16 |       85 |
|    17 |       69 |
|    18 |       69 |
|    19 |       53 |
|    20 |       56 |
|    21 |       47 |
|    22 |       37 |
|    23 |       41 |
|    24 |       40 |
|    25 |       21 |
|    26 |       33 |
|    27 |       25 |
|    28 |       28 |
|    29 |       21 |
|    30 |       12 |
|    31 |       14 |
|    32 |       19 |
|    33 |       14 |
|    34 |       13 |
|    35 |        9 |
|    36 |        5 |
|    37 |        9 |
|    38 |       13 |
|    39 |       12 |
|    40 |        8 |
|    41 |        4 |
|    42 |        7 |
|    43 |        6 |
|    44 |        3 |
|    45 |        5 |
|    46 |        6 |
|    47 |        7 |
|    48 |        8 |
|    49 |        2 |
|    50 |        4 |
|    51 |        4 |
|    52 |        4 |
|    53 |        5 |
|    54 |        4 |
|    55 |        2 |
|    56 |        2 |
|    57 |        2 |
|    58 |        4 |
|    59 |        2 |
|    60 |        2 |
|    61 |        1 |
|    62 |        2 |
|    63 |        3 |
|    64 |        3 |
|    65 |        1 |
|    66 |        2 |
|    67 |        1 |
|    68 |        4 |
|    69 |        1 |
|    70 |        1 |
|    72 |        2 |
|    73 |        1 |
|    75 |        2 |
|    77 |        2 |
|    78 |        1 |
|    79 |        1 |
|    81 |        3 |
|    84 |        1 |
|    86 |        1 |
|    87 |        1 |
|    88 |        1 |
|    91 |        1 |
|    92 |        1 |
|    95 |        1 |
|    96 |        1 |
|    97 |        1 |
|    98 |        1 |
|   101 |        1 |
|   102 |        2 |
|   107 |        1 |
|   112 |        1 |
|   113 |        1 |
|   119 |        2 |
|   121 |        1 |
|   123 |        1 |
|   125 |        2 |
|   126 |        1 |
|   138 |        1 |
|   139 |        1 |
|   140 |        2 |
|   154 |        1 |
|   168 |        1 |
|   188 |        1 |
|   215 |        1 |
+-------+----------+
104 rows in set (24.43 sec)

Change 358069 abandoned by Jdlrobson:
Event sampling is based on user bucket

https://gerrit.wikimedia.org/r/358069

Change 359921 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@master] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/359921

For the record, we earlier had a very similar problem with the MobileWebSectionUsage schema (as I may already have mentioned this in a recent discussion). See the May 2016 discussion at T128931#2283390 (and following comments, by myself, @dr0ptp4kt and @Jdlrobson ).

I merged i13n: Don't sample by pageview

  • Using ext.relatedArticles.instrumentationguarantees that events are logged randomly based on user session token for this particular feature
  • Using ext.relatedArticles.visibility guarantees the visibility option sticks across a user session

Regarding the follow up Hygiene: SamplingRate -> BucketSize when speaking to @Nuria, she pointed out we were not doing a true A/B test. We have a control which will be enabled for 98% of users and the other 2% will not see the feature. Of course we can normalise the 2%, but I wonder if the right thing is to end up with 3 groups and now is the best opportunity to do that?

ie.

  • We have 3 test groups - A, B and control. Control and A will get the related articles feature, B will not.
  • Only log events for A and B
  • Rename/repurpose the global config variables to account for this

Thoughts? Or do we consider this out of scope?
(If so Joaquin you should feel free to +2 in my absence)

I don't have any particular opinions on the design of the tracking and the test, but just wanted to point out that Hygiene: SamplingRate -> BucketSize is just a name change for the variables, from sampling to bucketing, given that the they are used for bucketing with mw.experiments.

Related articles now makes use of 2 different independent systems of bucketing:

  • One based of wgRelatedArticlesLoggingSamplingRate for bucketing a user into a log or do not log bucket (experiment name ext.relatedArticles.instrumentation)
    • Currently set to bucket 0.01 of user sessions into the log bucket
  • The other one based of wgRelatedArticlesEnabledSamplingRate for determining if the user will see related articles enabled or disabled (experiment name ext.relatedArticles.visibility)
    • Currently set to 1 by default (all sessions see related articles), and set to 0.98 on enwiki (98% of sessions see related articles enabled)

Again, these are independent. If we want to change the design of the experiment and tie it to a different logging strategy, we an do that, but it seems like a different task.

Sure but the name change also requires the hassle of updating production config so it seemed like a good time to pause and say is this what we really want.

Sure but the name change also requires the hassle of updating production config so it seemed like a good time to pause and say is this what we really want.

Thanks for saying this and asking for a pause.

I would say that I agree with @Jhernandez that this is a name change only that's inline with mw.experiments terminology – but then, I would as I wrote the initial change. With that in mind, I've added a backwards compatibility layer to CommonSettings.php in https://gerrit.wikimedia.org/r/#/c/360166/2/wmf-config/CommonSettings.php.

How do we move forward with this?

We'd need to swat the config change and then merge the change in Page previews shortly after.

@Nuria @ovasileva and @Tbayer please speak now if you are unhappy with having a 98% control group. I expect we'll SWAT this and wrap up the work sometime tomorrow so you have a limited window.

@Jdlrobson Since it is @Tbayer who is going to analyze the data I think he shoudl evaluate that. Let's just not call this test an AB test or the 98% a "control" group since an AB test implies buckets of equal size for control and experiment. There can be valid reasons why you might want to launch and track a feature just for 2% of your users and that seems what we are doing on this case. I have not looked at code in detail though.

@Jdlrobson - 98% works! What is the sampling rate?

What we would ideally want is something like this:

  • 98% have related articles ON and are not sampled (i.e. are not sending events)
  • 1% have related articles ON and are sampled (are sending events)
  • 1% have related articles OFF and are sampled

Obviously this depends on the sampling method being compatible with the bucketing method. If they are instead independent, i.e. if first bucket into (say) 98% related articles ON and 2% OFF, and then sample 1% of each group, we would need to double-check that we still get enough data for the OFF group.

Also, it looks like there continues to be a bit of confusion about terminology. Some general notes:

  • As @Nuria said, the control group is not the right term for the part of the population that one leaves out of the sample (does not record data for). Rather, the control group is the part of the sample where conditions are left the same (here, where we leave related articles active).
  • It's totally fine for an A/B test not to sample (record data for) the entire population - in the end it's just about drawing two samples and comparing the chosen metric.
  • It's also OK for A and B (i.e. the test and control groups) to have different sizes - it's just a bit unusual. (Normally the accuracy of the test will be determined by the size of smaller of the two, so one would basically be throwing away data, which is why most people don't do this. That said, there are other motivations for keeping one group larger than the other, which are not really applicable for us here, but are part of the rationale for multi-armed bandit testing methods.)

Change 360166 merged by jenkins-bot:
[operations/mediawiki-config@master] relatedArticles: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360166

Change 360165 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@master] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/360165

Sign off blocked on T167310 being completed, but when that is, Tilman would be best placed to help sign this off (check the number of sessions looks sensible)

Change 358046 abandoned by Jdlrobson:
Allow sampling by user session token

https://gerrit.wikimedia.org/r/358046

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way. Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

ping @Jdlrobson, @phuedx:

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way. Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

Not being familiar with the feature this is what I see:

I think related articles is enabled for all wikipedia here: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17285
and that whether you are showed more articles is driven by isEnabledForCurrentUser function here: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.bootstrap/index.js#L73 which has two buckets on unequal size: 1% and 99%, Thus this code "disables related articles for 1% of population" Terminology is confusing cause it talks about buckets and control when in this case (if I am understanding the code right) is neither, we are just enabling a feature for 99% of devices.

This other file uses a sample rate of 1% to log "related articles" events: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.eventLogging/index.js#L28
What I do not understand is how this second file is called only for the 99% of devices for which the related pages are enabled (might be missing something obvious) and not the whole device population.

Side note: will comment on gerrit patch about variable naming.

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team. I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

The talk page is accurate. I've updated the description of T167310 (T167310#3378461). If you do think that a sampling rate of 0.01% would be more suitable, then we (Reading Web) can change it relatively easily.

Hi folks, just to double triple check: Can someone verify whether the bucketing and sampling method and rates now match the description on the schema talk page? There seems to be a contradiction regarding sampling rates. (T167310 says 0.01%, but the talk page specification says 1%, based on what I was told last month when I started looking into this whole issue, and vetted later by someone from the team.

T167310: Re-launch related pages a/b test on mobile enwiki was wrong on the description. @phuedx has updated the description of that task to reflect the truth: T167310#3378461

I've also updated https://meta.wikimedia.org/wiki/Schema_talk:RelatedArticles with a link to the source of truth to get the sampling rate number (1 based, right now at 0.01).

I think 0.01% may actually be more suitable, but in any case it should be documented accurately.)

If the logging sampling rate has to change, it needs to be a new card please.

Also, now that we are sampling and bucketing by the same token, we need to document whether this is done in a statistically independent way.

We use the same user session token for the two, but internally they are hashed with the experiment name, which is different for bucketing and logging sampling, so they are independent.

See https://doc.wikimedia.org/mediawiki-core/master/js/source/mediawiki.experiments.html#mw-experiments-method-getBucket

			hash = hashString( experiment.name + ':' + token );

Can someone post the formula by which we calculate from the token value whether the session is bucketed not, and the corresponding formula for sampling?

We use mw.experiments with two buckets to get a weighted boolean. https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=%22mw.experiments.getBucket%22

For enabled/disabled, https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L31-L38

Which ends up being (in enwiki), control = 0.02, A = 0.98 (https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17314) (it is control = 0, A = 1 everywhere else)

Independently, logging https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.eventLogging/index.js#L34

Which ends up being (everywhere), control = 0.99, A = 0.01. See https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L2894

Not sure if this info fits the definition of formulas but I hope it helps.

Not being familiar with the feature this is what I see:

I think related articles is enabled for all wikipedia here: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17285
and that whether you are showed more articles is driven by isEnabledForCurrentUser function here: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.bootstrap/index.js#L73 which has two buckets on unequal size: 1% and 99%, Thus this code "disables related articles for 1% of population" Terminology is confusing cause it talks about buckets and control when in this case (if I am understanding the code right) is neither, we are just enabling a feature for 99% of devices.

isEnabledForCurrentUser uses wgRelatedArticlesEnabledBucketSize which is 1 by default (enabled for all), but 0.98 for enwiki (https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17314), so 2% disabled and 98% enabled.

The code talks about control and buckets, because we are using mediawiki core's mediawiki.experiments library which forces you to have a bucket called control (see source), and it performs selection based on buckets.

So in the source, we use that terminology, on the outside, they are sampling rates per session, as you mention in the code review. It is understandable that it is confusing coming at it the first time.

This other file uses a sample rate of 1% to log "related articles" events: https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/master/resources/ext.relatedArticles.readMore.eventLogging/index.js#L28
What I do not understand is how this second file is called only for the 99% of devices for which the related pages are enabled (might be missing something obvious) and not the whole device population.

I believe it loads all the time if the related articles feature loads (there are skin whitelists) so it the logging code would load for the entire device population as you say, because there are events we log independently of if the user has been bucketed to see related articles.

The [logEnabled](https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L91) event is logged for both feature enabled and disabled buckets to eventlogging, maybe to sanity check the percentages against the defined sampling rates. @Tbayer should know more about why we were asked to log this.

Thanks, @Jhernandez for the explanation of the bucketing algorithm provided by [mediaWiki.experiments](https://github.com/wikimedia/mediawiki/blob/85a48d8f/resources/src/mediawiki/mediawiki.experiments.js).

To add a little more detail: the mw.experiments.getBucket function takes an experiment name, a set of bucket-weight pairs, and some token and returns the name of a bucket. It will always return the same value given the same arguments. Moreover, the function doesn't have side effects: it doesn't store the token for the developer.

As @Jhernandez said in T167236#3378564, getBucket:

  1. Hashes the experiment name and the token to get a number between 0 and 1 (H)
  2. Sums the weights of the buckets (R)
  3. Finds the bucket corresponding to H * R, i.e. it iterates over the buckets, summing the weights until the sum is greater than or equal to the hash.

A couple of notes:

  • Iterating over the properties of an object is undefined in the JavaScript specification but all modern browsers iterate over them in the order in which they were defined.
  • getBucket uses the Jenkins one-at-a-time hash function internally but should probably be moved to FNV-1a (32-bit) as it's arguably better for uniqueness and speed (as well as already implemented within the MediaWiki codebase).

(Thanks for the informative explanations! Will finish reading them later.)

The [logEnabled](https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/blob/750f3d73fd0cf8d82c02ca7a85b9e36979736d56/resources/ext.relatedArticles.readMore.bootstrap/index.js#L91) event is logged for both feature enabled and disabled buckets to eventlogging, maybe to sanity check the percentages against the defined sampling rates. @Tbayer should know more about why we were asked to log this.

Hold on - please keep in mind that I basically have not been involved with this schema until last month, so this kind of question is perhaps best addressed to the people who were involved with T114303 back in 2015 and/or these later (February 2017) changes ;)

I'm not seeing an "logEnabled" event in the schema . Does this refer to the event names feature-enabled and feature-disabled?

I believe it loads all the time if the related articles feature loads (there are skin whitelists) so in the logging code would load for the entire device population as you say, because there are events we log independently of if the user has been bucketed to see >related articles.

Ok, so if i understand this correctly I think is worth thinking whether this experiment is set up how @Tbayer
asked earlier.

His request:
<snipet>
98% have related articles ON and are not sampled (i.e. are not sending events)
1% have related articles ON and are sampled (are sending events)
1% have related articles OFF and are sampled
</snipet>

What *I think* is on the actual code:

99% have related articles ON
1% have related articles OFF

of the 100% population (regardless of whether related articles is ON or OFF) 1% are logging events

To clarify, what I wrote in T167236#3371031 is that we would have such a setup (98+1+1, which would obviously require different sampling rates for the two buckets) ) "ideally". I am aware that we currently applying the same sampling rate to both buckets and hence end up with different sample sizes for group A and group B. As mentioned back then, that's not a dealbreaker - just a bit awkward ;)

Hold on - please keep in mind that I basically have not been involved with this schema until last month, so this kind of question is perhaps best addressed to the people who were involved with T114303 back in 2015 and/or these later (February 2017) changes ;)

Sorry, I didn't realize.

I'm not seeing an "logEnabled" event in the schema . Does this refer to the event names feature-enabled and feature-disabled?

Indeed, sorry for the confusion (source)

To clarify, what I wrote in T167236#3371031 is that we would have such a setup (98+1+1, which would obviously require different sampling rates for the two buckets) ) "ideally". I am aware that we currently applying the same sampling rate to both buckets and hence end up with different sample sizes for group A and group B. As mentioned back then, that's not a dealbreaker - just a bit awkward ;)

^ all that. If it becomes a deal-breaker then we'll create another task to change the behavior.

Change 361715 had a related patch set uploaded (by Jdlrobson; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/361715

Change 361716 had a related patch set uploaded (by Jdlrobson; owner: Phuedx):
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/361716

@Jdlrobson: I have looked at experiment setup (see my comment above) and seems that @Tbayer is aware of trade offs. I have not looked at code of feature in detail.

Some corrections as to how buckets are calculated (not sure these corrections matter that much, really, any issues you see will likely come from the instrumentation, not the bucketing code which is fine as noted earlier by @phuedx ). Code here: https://github.com/wikimedia/mediawiki/blob/85a48d8f/resources/src/mediawiki/mediawiki.experiments.js#L97-L105

  • Hashes the experiment name and the token to get a number that is less than 2^32 (H)
  • Sums the weights of the buckets to get the range (this number will be 1 if buckets comprise the totality of population). (R)
  • Normalizes H * R by dividing by 2^32 and getting a number that is thus less than 1 (B)
  • Loops over buckets accumulating weights and returns 1st bucket for which B < accumulator

As @phuedx pointed out this algorithm is deterministic, and hash function seems to be uniform enough (which is different from crypto-safe) . I do not see any issues with it, nor the need to migrate it. The hashing needs to work for a set of, say, 500 million (per month) session ids and produce uniform-enough results. Poor results would be collisions of the hash function. In the hash function implementation it notes that you might have 1 collision per 2^32 space. That seems fine.

Change 361715 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] i13n: Don't sample by pageview

https://gerrit.wikimedia.org/r/361715

Change 361716 merged by jenkins-bot:
[mediawiki/extensions/RelatedArticles@wmf/1.30.0-wmf.6] Hygiene: SamplingRate -> BucketSize

https://gerrit.wikimedia.org/r/361716

I've re-run the query for July:

SELECT pages, COUNT(*) AS sessions FROM (
SELECT COUNT(DISTINCT event_pageId) AS pages
FROM log.RelatedArticles_16352530
WHERE wiki = 'enwiki' and event_skin = 'minerva-stable'
AND LEFT(timestamp, 6) = '201707'
GROUP BY event_userSessionToken) AS pages_per_session
GROUP BY pages
ORDER BY pages

This is what I see:

1	725172
2	57851
3	14370
4	5878
5	2830
6	1675
7	1030
8	738
9	469
10	372
11	283
12	204
13	159
14	126
15	117
16	81
17	73
18	71
19	46
20	38
21	27
22	23
23	39
24	18
25	13
26	23
27	20
28	16
29	21
30	7
31	9
32	10
33	12
34	6
35	9
36	8
37	7
38	6
39	4
40	4
41	4
42	4
43	3
44	2
45	2
46	2
47	1
48	4
49	4
50	1
51	2
52	1
56	1
57	1
58	2
60	1
62	1
64	1
69	1
70	1
72	1
73	1
75	1
76	1
80	1
83	1
85	1
90	1
91	1
93	1
95	1
107	1
108	1
124	1
131	1
159	1
207	1
312	1

This looks much more healthy...