Page MenuHomePhabricator

Switch Portal EL & survey sampling algorithm to use seeded RNG
Closed, ResolvedPublic

Description

The current algorithm converts the (randomly generated) event logging session ID to an integer and then checks if it is divisible by N where 1-in-N is the sampling rate we want. For example, 0.5% rate is 1 in 200. This effectively means that we can't use anything that is a factor of N (e.g. 50, 10, 100, 40 for 200) for subsequent sampling, as was the case in the recent survey banner situation.

We should try moving to a seeded random number generation (e.g. https://commons.wikimedia.org/wiki/MediaWiki:Gadget-math.seedrandom.js) that allows us to set the seed to that same (randomly) generated session ID and then use the traditional method for getting random numbers between 1 and N, which give us very easily understood sampling code:

// assume the seed has been set to session ID
function oneIn(N) {
    return(Math.floor((Math.seededrandom() * N) + 1))
}
if (oneIn(200) == 1) {
  // selected for event logging
  if (oneIn(10) == 1) {
    // selected for A/B testing
    if (oneIn(2) == 1) {
      // selected for the control bucket
    } else {
      // selected for the test bucket
    }
  } else {
    // rejected from A/B testing, but still enrolled in EL
  }
} else if (oneIn(50) == 1) {
  // rejected from EL but selected for survey banner
} else {
  // rejected from EL and survey banner
}

The logic would play out the same every time the page is refreshed as long as the user has the same session ID.