Page MenuHomePhabricator

xLab API: generate enrollment and bucketing for Varnish authority
Closed, ResolvedPublic8 Estimated Story Points

Description

  • Decision on query parameters and how exactly this endpoint is called (outside this task, conversation happening)
  • Heartbeat monitor for this to make sure it's up and serving config to Varnish config fetcher. Alerts set up to fix if not
  • Integration tests to validate format
  • Assign bin ranges in Varnish format in a simple greedy algorithm
  • xLab experiments API should include recently concluded (<24h) experiments

Some details here (for the first approach we will use directly the full group names because features won’t be added to the events for now)


main.js
const fs = require('fs')
const path = require('path')
const yaml = require('yaml')
const dayjs = require('dayjs')

const rawContents = fs.readFileSync(path.resolve(__dirname, 'input.yaml'), 'utf-8')
const { experiments } = yaml.parse(rawContents)

const dbnameToDomainNameMap = require(path.resolve(__dirname, 'dbname_to_domain_name_map.json'))

const result = Object.entries(experiments)
  .reduce(
    (acc, [experimentName, experimentConfig]) => {
      const { groups, projects } = experimentConfig
      const domains = Object.entries(projects).reduce(
        (acc, [project, projectConfig]) => {
          const sampleSizePerGroup = projectConfig.sample_size / groups.length
          const binRangeSizePerGroup = sampleSizePerGroup * 100_000

          const entry = {}
          entry.groups = {}

          for (let i = 0; i < groups.length; ++i) {
            let group = groups[i]

            if (typeof group === 'object') {
              const t = Object.entries(group)[0]
              group = t[0]

              entry.groups[`_comment_${t[0]}`] = t[1].description
            }

            entry.groups[group] = [
              i * binRangeSizePerGroup,
              (i + 1) * binRangeSizePerGroup - 1
            ]
          }

          const {
            domain_name: domainName,
            mobile_domain_name: mobileDomainName
          } = dbnameToDomainNameMap[project]

          acc[mobileDomainName] = acc[domainName] = entry

          return acc
        },
        {}
      )

      const entry = {}

      if (experimentConfig.description) {
        entry._comment = experimentConfig.description
      }

      entry.start = dayjs(experimentConfig.utc_start_date).format('YYYY-MM-DDTHH:mm:ss[Z]')
      entry.end = dayjs(experimentConfig.utc_end_date).format('YYYY-MM-DDTHH:mm:ss[Z]')
      entry.domains = domains

      acc[experimentName] = entry

      return acc
    },
    {}
  )

console.log(JSON.stringify(result, null, 2))

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add query parameters to experiments endpointrepos/data-engineering/mpic!166milimetricexperiments-controllermain
Customize query in GitLab

Event Timeline

Milimetric set the point value for this task to 5.Apr 15 2025, 3:36 PM

milimetric opened https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/merge_requests/166

Draft: [WIP] working on shaping the experiment controller to handle all the

@phuedx cc @Sfaci @Milimetric some questions about the domains key in the ?format=config&authority=varnish version of the experiments endpoint.

I understand the derivation of the groups' names in the proposed implementation of the TDR for the groups sub-key i.e. the shortened Internal Group Name of each group.

But I don't understand the following:

  1. Should every applicable domain be represented as a domain key for each experiment? Or just the ones specified in the project and sample size field set for a given experiment? I'm assuming it's the latter - just want to confirm
  2. For each domain key, how should the bin numbers for each shortened group name key in the groups data object be populated? I'm not seeing a discernible pattern in the example json
  1. Should every applicable domain be represented as a domain key for each experiment? Or just the ones specified in the project and sample size field set for a given experiment? I'm assuming it's the latter - just want to confirm

The latter. We should only be returning information about the domains that the experiment is running on. This not only reduces transfer size over the network but, critically, reduces the amount of space per experiment in Varnish's in-memory representation of the experiment config. If you remember, Varnish allocates a fixed amount of memory to store the experiment config, so we need to keep experiment configs as small as possible.

  1. For each domain key, how should the bin numbers for each shortened group name key in the groups data object be populated? I'm not seeing a discernible pattern in the example json
// Varnish divides all traffic into 100,000 bins for each experiment. Therefore, we take our target sample size, e.g. 10% of all traffic/10,000 bins, divide it by the number of groups,
// and then make a series of bin ranges, e.g. [0..999], [1000..1999], etc.

const SAMPLE_SIZE = 0.1 // 10%
const GROUPS = [ 'control', 'A', 'B', 'C' ]

const sampleSizePerGroup = SAMPLE_SIZE / GROUPS.length
const binRangeSizePerGroup = sampleSizePerGroup * 100_000

const binRanges = GROUPS.reduce(
  (acc, cur, i) => {
    acc[cur] = [
      Math.floor( i * binRangeSizePerGroup ),
      Math.floor( (i + 1) * binRangeSizePerGroup - 1 ),
    ];

    return acc;
  },
  {}
)

console.log( binRanges )

The above is ever-so-slightly wrong as it generates sequential bin ranges. If there is no room for the bin range to increase, then, without more state, we can't allow for the user to increase the sample rate of the experiment.

The algorithm should be updated to maximally separate the bin ranges, which gives them the most room to grow. For example:

const SAMPLE_SIZE = 0.1 // 10%
const GROUPS = [ 'control', 'A', 'B', 'C' ]

const sampleSizePerGroup = SAMPLE_SIZE / GROUPS.length
const binRangeSizePerGroup = sampleSizePerGroup * 100_000
const binRangeMaxSizePerGroup = 100_000 / GROUPS.length;

const binRanges = GROUPS.reduce(
  (acc, cur, i) => {
    const start = Math.floor( binRangeStart = i * binRangeMaxSizePerGroup )

    acc[cur] = [
      start,
      start + Math.floor( binRangeSizePerGroup - 1 ),
    ];

    return acc;
  },
  {}
)

console.log( binRanges )

@phuedx @Milimetric @Sfaci more Qs for the response:

The current UI for variants in the xLab experiment form looks like this:

Screenshot 2025-05-01 at 8.19.01 AM.png (2×1 px, 247 KB)

The example json for ?format=config&authority=varnish in the design sprint doc includes some keys that I'm not sure how we're populating - namely:

  1. shared_selector << is this extracted from the machine name of the treatment group? or an upcoming field we'll be adding to the form?
  2. cache_split << wouldn't this always be true for an experiment? what determines its value if not?
  3. in the domains key, there is a sub-key inside a given domain called unique_domain which is a boolean << what determines this value?

@phuedx @Milimetric @Sfaci more Qs for the response:

  1. shared_selector << is this extracted from the machine name of the treatment group? or an upcoming field we'll be adding to the form?

This is an optional override of the A/B test name that, if set, will be used when generating the subject ID. It's OK to leave this out for now.

  1. cache_split << wouldn't this always be true for an experiment? what determines its value if not?

Would this always be true for an experiment? No. Not all everyone experiments require a cache split (e.g. the number of results shown in the search autocomplete). This is out of scope for the MVP and, since this defaults to true, it's OK to leave this out for now.

  1. in the domains key, there is a sub-key inside a given domain called unique_domain which is a boolean << what determines this value?

By default, the Edge Unique cookie is shared across 2LDs with some exceptions, e.g. the same Edge Unique will arrive at en.wikipedia.org and de.wikipedia.org. Therefore, a user enrolled in an experiment running on those wikis will have the same subject ID. The unique_domain option overrides this behaviour by instructing Varnish to mix in the domain name to the subject ID.

This is out of scope for the MVP and, since this defaults to false, it's OK to leave this out for now.


If you're ever in doubt, the libvmod-wmfuniq repo provides explanations for all of these options:

I'll make sure that the doc is up to date…

I'll make sure that the doc is up to date…

Done™

As part of deploying NPM package security updates yesterday, we created another release tag v0.6.0 that includes both the security updates and the new experiments endpoints. The v0.6.0 release was deployed to staging and prod.

The new endpoints with query parameters are available on prod -- note that the experiments on prod are all test/disposable data