We have a couple metrics implemented in relforge that can analyze the results:
- MRR
- pFound / eFound
These know how to read the WikidataCompletionSearchClicks eventlogging schema, but that schema doesn't have any affordances for AB testing.
- Add AB test bucket to eventlogging schema
- Update javascript to sample users into testing buckets.
- Update relforge to split metrics by test bucket
- Create backend test profiles
The test should be set up this way:
- Config value that enables the whole test (in mediawiki-config)
- Front end decides on enabling the test for particular request and bucketing (so far we enable only for "en" language and items)
- Front end adds parameters to request to set test profile - cirrusWBProfile and cirrusRescoreProfile.
- Backend just uses the test profiles (which need to be set up) to deliver results
- Front end logs the test bucket together with the results in WikidataCompletionSearchClicks