Page MenuHomePhabricator

Spike: xLab: Domains and dblists
Closed, ResolvedPublic5 Estimated Story Points

Description

  • In scope for this task
    • Update database and code to refer to wiki_ids (eg. enwikivoyage, wikidatawiki, etwiki)
    • Update code to show users percentages instead of rates for all sampling
  • Later or elsewhere
    • Refresh documentation (in the refresh doc task)
      • All nuances about how the "Project" concept maps to domains, language variants, and so on will be described in an on-wiki documentation page and linked from the description text of the section. This includes how mobile domains are included for now.
      • All nuances about sampling will be explained in an on-wiki documentation page, also linked from the description of this section
    • Picking targets in a more nuanced way (whole project families, certain language variants that map to the same wiki_id, excluding mobile traffic, etc. will be considered in the future but not part of this scope
    • limits on sampling (eg. no more than 10%) will be implemented as part of the validation work

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Update sample_rate to percentagesrepos/data-engineering/test-kitchen!173cjmingT391955/sample-rate-percentagesmain
Customize query in GitLab

Event Timeline

Milimetric renamed this task from (stretch) Domains and dblists to Spike: Domains and dblists.Apr 15 2025, 12:15 PM
Milimetric triaged this task as Medium priority.
Milimetric updated the task description. (Show Details)

Suggestion to use 'wiki_id' concept a lower level main selector value instead of 'project' or 'domain'. You could still use those terms, but I believe wiki_id is the most canonical and least variable and confusing concept.

Milimetric set the point value for this task to 5.Apr 18 2025, 9:26 PM

I'd say that the description notes are precise. Thank you, @Milimetric! There are some extra interaction and validation details defined in the design specifications. I believe it'd be a good idea to create an implementation (sub?)task and include all specs there. Would that make sense?

Milimetric renamed this task from Spike: Domains and dblists to Spike: xLab: Domains and dblists.May 1 2025, 3:47 PM

The design adjustments required to align xLab's UI with this database and percentage updates are documented in the following task: T392911: xLab: Update ‘Project and sample size’ module. Wondering if these two tasks should be more explicitly related 🤔

So after some digging through the code base, turns out we are not saving dbname in the database in its own column -- rather the sample_rate column in the instruments table is a JSON type in which we currently store the wiki_id along with its corresponding sample_rate in the following format:

"sample_rate": {
  "default": 0,
  "0.1": [
    "frwiki",
    "dewiki"
  ],
  "0.01": [
    "enwiki"
  ]
},

Wherein each sample rate serves as the key and an array of associated wiki ids that share that same sample rate is the value.

Thus no backend changes are needed for this change because we never incorporated dbname into the xLab data model.

There are however frontend changes that we can make to record the sample rate as a percentage (a number between 0 and 100). This percentage representation of the sample rate is what the user sees on the create/edit experiment form and the read view of the experiment of the above MR. I err'd on keeping the sample rate representation a fractional number (between 0 and 1) for the experiments endpoints api responses where applicable.

When saving to the record however, we do the requisite math to save the percentage as a fraction of 1 (i.e. a floating number between 0 and 1).

This might beg the question of whether we want to update the data model to include a lookup table between instruments and wiki ids and sample rates?
For MVP it's probably fine to keep the JSON blob field as is. I'm not sure if the gains in having a more prescribed data model for sample rates + wiki ids (no need to parse json, etc) outweigh the cons (more complex queries - table joins, etc).

Fwiw the relevant field set for this data will be updated imminently in T392911: xLab: Update ‘Project and sample size’ module

@phuedx @mpopov cc @Sfaci what would be the upper and lower limits, as well as increments in terms of sample size for experiments?

For a large language wikipedia like english, I thought it was like 0.001 or 0.01 percent?
I'm not sure that we'd be experimenting on wikipedias of that size (presumably more like pilot wikis)

If we're moving to percentages to represent sample size, presumably we'd want to allow for fractional numbers? i.e. below 1% or 2.5% ? The increment of the text input field for the sample size determines what values are accepted by the form - if we increment by 1, fractional values are unable to be saved. By incrementing by 0.1, we can save values like 1.5% or 0.8% for sample size for a given wiki.

Currently we can set min and max values for the sample size field (i.e. previously 0 and 1, now from 0 to 100) and increment by 0.1? is this too small? I'm assuming 0.01 increment is too small?

Proposal:

  • Use increments of 0.01 (percentage points), allowing for traffic allocations (enrollment sample rates) of 1.5%, 0.8%, 0.05%, 2.02%, 0.01%
  • For English Wikipedia, maximum allocation is 0.1%
  • For all other wikis, maximum allocation is 10%

Open question:

  • If experiment owner doesn't add English Wikipedia specifically and just uses default (say, 10%), should we block them from doing that?

thanks @mpopov -- so we will go with 0.01 increments -- the rest of your points can be addressed as part of validation cc @Sfaci

re: open Q -- imho we should block this but this is getting into the weeds of validation rules for this field set that should probably be hashed out in another relevant ticket

the rest of your points can be addressed as part of validation cc @Sfaci

I think so. We could address the other proposals as new rules for the validation process we are working on:

  • For English Wikipedia, maximum allocation is 0.1%
  • For all other wikis, maximum allocation is 10%

re: open Q -- imho we should block this but this is getting into the weeds of validation rules for this field set that should probably be hashed out in another relevant ticket

Good question! According to what you have mentioned before, it seems we should do something to avoid having english wikipedia with more than 10% of the traffic assigned. As Clare mentions, I think we should address this also as a part of the validation task (or maybe another one more specific to deal with these kind of validation rules that are more related to the functionality? Are there more?)

Now that we are considering these new proposals I guess that, at least, we should add them to the spreadsheet where we are working on all the specific details for every validation rule. There @Sarai-WMF is coordinating all that work. We would need to know where we want to validate them (frontend vs backend) and what's the right message for the user.

And maybe we can talk about in a design review session. What do you think @Sarai-WMF ??

Oh! One more point: the maximum allocation of 0.1% for English Wikipedia should really be only when edge_unique identifier type is selected because reader traffic is such a special case.

For logged-out user experiments (using mw_user_id identifier), a maximum of 10% would be fine on English Wikipedia.

Now that we are considering these new proposals I guess that, at least, we should add them to the spreadsheet where we are working on all the specific details for every validation rule. There @Sarai-WMF is coordinating all that work. We would need to know where we want to validate them (frontend vs backend) and what's the right message for the user.

And maybe we can talk about in a design review session. What do you think @Sarai-WMF ??

Agreed, Santi! This would do for a great design review topic. Let's discuss all these rules combined, and with a visual reference that helps validate the behavior as a whole. When it comes to planning, though, my inkling instead would be to implement these advanced traffic allocation rules in a separate ticket. It might be a good idea to compartmentalize and limit the scope of the current validation task. What do you all dear peeps think?

my inling instead would be to implement these advanced traffic allocation rules in a separate ticket. It might be a good idea to compartmentalize and limit the scope of the current validation task. What do you all dear peeps think?

@Sarai-WMF I think it's a great idea. The validation ticket has been delayed a bit for different reasons and its scope has increased already (moving the backend to ES modules) and I think that, for that one, we should focus on the validation process itself with the basic rules we already had. And there is still a pending discussion about those new rules. After all, once the validation process itself is implemented, adding new rules will be really easy. That's why I think a separate ticket would be great.

Let's discuss this during design review and reflect decisions on a separate task 👍🏻

@mpopov I'd like to validate the following traffic assignation limitations based on the selected identifier with your help 🙏🏻 The cell with the question mark contains the information that I'm unsure of:

User identifier type:mw_user_idedge_unique
English wikipedia:10% max. traffic0.1% max. traffic
Any other wiki:10% max. traffic (?)10% max traffic

Sorry if I missed this in the conversation above. Thank you!

Conclusion that @JVanderhoop-WMF and I arrived at in Slack thread:

User identifier type:mw_user_idedge_unique
English wikipedia:no max0.1% max. traffic
Any other wiki:no max10% max traffic

The current system/design are flexible enough to replicate GrowthBook user experience (where you just have 1 allocation setting and then can add targeting rules like "wiki in list x, y, z" and "wiki not in list x, y, z") with least additional effort. If we need to revisit this and implement an allowlist (of wikis to include in experiment) and denylist (of wikis to exclude from experiment), we can do that later.

And then in the docs we can make some recommendations such as:

  • To exclude wikis from experiment, add and set their allocations to 0%
  • To do experiment only on some wikis, set default to 0% and then add wikis to override
  • Use the same rate across wikis, do not vary by wiki

That last recommendation is especially important because:

If the rates are allowed to vary by wiki, then a user might be enrolled on one wiki and not enrolled on another. Even though the enrollment is based on a user's global user ID, a user is guaranteed to be enrolled in the same way on any two wikis provided

  • their rates are exactly same (which yields same determination of enrollment status)
  • OR if user is enrolled on the lower-rate wiki, they would also be enrolled on the higher-rate wiki
    • if user is in the 5% on English Wikipedia and they also frequent Korean Wikipedia, they would also be in the 10% sample on Korean
    • if user is in the 10% sample on Korean and they also frequent English, they may or may not be in English's 5% sample

Now, functionality this is not so different from the status quo where users are enrolled into experiments on a per-wiki basis. For example, when Editing does an A/B test on French, Spanish, and Italian Wikipedias, if an editor is a member of all three of those communities, they might end up in control on French, treatment on Spanish, and not in experiment at all on Italian.

This comment was removed by cjming.

Listed max traffic validation rules as AC for new ticket T396059: xLab: update validation for traffic per wiki for Experiments and moving this ticket to done