Switch Portal EL & survey sampling algorithm to use seeded RNG
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	mpopov
	May 17 2016, 8:48 PM

Description

The current algorithm converts the (randomly generated) event logging session ID to an integer and then checks if it is divisible by N where 1-in-N is the sampling rate we want. For example, 0.5% rate is 1 in 200. This effectively means that we can't use anything that is a factor of N (e.g. 50, 10, 100, 40 for 200) for subsequent sampling, as was the case in the recent survey banner situation.

We should try moving to a seeded random number generation (e.g. https://commons.wikimedia.org/wiki/MediaWiki:Gadget-math.seedrandom.js) that allows us to set the seed to that same (randomly) generated session ID and then use the traditional method for getting random numbers between 1 and N, which give us very easily understood sampling code:

// assume the seed has been set to session ID
function oneIn(N) {
    return(Math.floor((Math.seededrandom() * N) + 1))
}
if (oneIn(200) == 1) {
  // selected for event logging
  if (oneIn(10) == 1) {
    // selected for A/B testing
    if (oneIn(2) == 1) {
      // selected for the control bucket
    } else {
      // selected for the test bucket
    }
  } else {
    // rejected from A/B testing, but still enrolled in EL
  }
} else if (oneIn(50) == 1) {
  // rejected from EL but selected for survey banner
} else {
  // rejected from EL and survey banner
}

The logic would play out the same every time the page is refreshed as long as the user has the same session ID.

Related Objects
Search...

Status	Assigned	Task
Resolved	debt	T131526 [EPIC] Wikipedia.org A/B Test (Caterpiller): languages by article count
Resolved	mpopov	T134011 A/B Test (Caterpiller): Analyze the results of the languages by article count test
Resolved	debt	T134010 A/B Test (Caterpiller): Disable the test for the languages by article count
Resolved	mpopov	T134009 A/B Test (Caterpiller): Check that all is well with the languages by article count test
Resolved	Jdrewniak	T134008 A/B Test (Caterpiller): Implement the languages by article count test
Resolved	Jdrewniak	T135558 Switch Portal EL & survey sampling algorithm to use seeded RNG

Event Timeline

mpopov created this task.May 17 2016, 8:48 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 17 2016, 8:48 PM

debt assigned this task to Jdrewniak.May 17 2016, 9:43 PM

debt triaged this task as Medium priority.

debt edited projects, added Discovery-Portal-Sprint; removed Discovery-Portal-Backlog.

debt added a subscriber: Jdrewniak.

Jdrewniak moved this task from Backlog to In Progress on the Discovery-Portal-Sprint board.May 31 2016, 2:28 PM

Danny_B added a project: Discovery-ARCHIVED.Jun 2 2016, 7:53 PM

Thanks, @mpopov - this is good stuff! :)

debt moved this task from In Progress to Done on the Discovery-Portal-Sprint board.Jun 9 2016, 9:04 PM

debt closed this task as Resolved.Jun 14 2016, 11:25 PM

debt moved this task from Done to Completed on the Discovery-Portal-Sprint board.Aug 12 2016, 8:02 PM

Switch Portal EL & survey sampling algorithm to use seeded RNGClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Switch Portal EL & survey sampling algorithm to use seeded RNG
Closed, ResolvedPublic
Actions

Related Objects
Search...