Page MenuHomePhabricator

Test perfield_builder on spaceless languages
Closed, ResolvedPublic

Description

As a maintainer of CirrusSearch I would like to reduce the usage of the query_string query so that I can reduce technical debt and verify that the fixes made for T262845 are valid.

The use of the default query builder and classic rescore window are left overs of the switch to BM25 for which the tests were inconclusive (https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/).
I think it would make sense to re-assess this by running another A/B test because many components related to these languages have changed since then:

  • auto_generate_phrase_queries is no longer available and was probably the cause of the low recall on such languages: T219267
  • there were no dedicated analyzers for chinese, japanese and thai (T158203, T166731, T151743)

I suggest testing:

  • wgCirrusSearchFullTextQueryBuilderProfile: perfield_builder
  • wgCirrusSearchRescoreProfile: wsum_inclinks

on all wikipedias using spaceless languages:

  • bowiki, dzwiki, ganwiki, jawiki, kmwiki, lowiki, mywiki, thwiki, wuuwiki, zhwiki, zh_classicalwiki, yuewiki, zh_yuewiki, bugwiki, cdowiki, crwiki, hakwiki, jvwiki, nanwiki, zh_min_nanwiki

AC:

  • run an A/B test on these wikis
  • provide some data to ensure that the fixes for T262845 are valid

Event Timeline

Change 635311 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/WikimediaEvents@master] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/635311

Change 635313 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] A/B test perfield build on spaceless languages

https://gerrit.wikimedia.org/r/635313

Change 635313 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] A/B test perfield build on spaceless languages

https://gerrit.wikimedia.org/r/635313

Mentioned in SAL (#wikimedia-operations) [2020-12-07T12:28:15Z] <dcausse@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T266027: [cirrus] A/B test perfield build on spaceless languages (duration: 01m 07s)

Change 646777 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/WikimediaEvents@wmf/1.36.0-wmf.20] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/646777

Change 635311 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/635311

Change 646777 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.36.0-wmf.20] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/646777

Mentioned in SAL (#wikimedia-operations) [2020-12-08T12:28:03Z] <dcausse@deploy1001> Synchronized php-1.36.0-wmf.20/extensions/WikimediaEvents/: T266027: [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 00s)

Change 646779 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/WikimediaEvents@wmf/1.36.0-wmf.21] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/646779

Change 646779 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.36.0-wmf.21] [cirrus] setup perfield builder A/B test on spaceless languages

https://gerrit.wikimedia.org/r/646779

The A/B test is running and events are flowing in. It will be de-activated on Tuesday December 15.

Mentioned in SAL (#wikimedia-operations) [2020-12-16T19:18:12Z] <dcausse@deploy1001> Synchronized php-1.36.0-wmf.21/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s)

Mentioned in SAL (#wikimedia-operations) [2020-12-16T19:21:33Z] <dcausse@deploy1001> Synchronized php-1.36.0-wmf.22/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s)

Moving back to in progress for writing a small report based on the data collected.

dcausse added subscribers: EBernhardson, TJones.

Notebook uploaded to https://people.wikimedia.org/~dcausse/T266027%20perfield%20builder%20test%20spaceless%20languages.html

A total of 6,046,684 events have been collected between 2012-12-09T00:00:00Z and 2012-12-16T00:00:00Z, that originates not surprisingly for the most part from chinese, japanese and thai wikipedias.

There are still a non negligible number of sessions where a problem have been detected (e.g. mismatch between frontend and backend), generally below 3% of the collected sessions have to be dropped. 1.25% for chinese, 0.81% for japanese and 2.70% for thai. It does not seem enough to justify investing more work on fixing the search A/B test instrumentation code, reasons might be because of the way the search session starts and how/when the user is selected in a particular group, there are conditions where these are in conflict. As discussed with @EBernhardson it should be the responsibility of the backend to assign a user to a group not the frontend but doing such work is probably out of the scope of the search team.
As to decide if the fixes attempted in T262845 are valid, the answer is no but they allow at least to clean-up the events by detecting suspicious sessions where a mismatch was detected.

On the effect of the perfield builder on these wikis the goal of the A/B test was to make sure that it does not cause any dramatic degradation in the search experience.
Using a very naive satisfaction metric:

  • successful session: at least one fulltext search and at least one click
  • unsuccessful session: at least one fulltext search and no click

We see that there is a slight preference for the perfield builder, control is still prefered on:

  • wuuwiki: decreased from 22.45% to 22.00% of successful sessions
  • zh_yuewiki: decreased from 27.56% to 25.07%
  • bowiki: decreased from 45.45% to 22.22% but is very unlikely to bear any meaning given the number of session collected here (11 for control, 9 for perfield)

On the ZRR side there does not seem to be any noticeable variation and this is what we expected.

As a conclusion I think this is safe to enable the perfield builder on these languages without degrading massively the user experience.

On the analysis itself, I could not reuse the automated A/B report generator as it is currently broken (still depends on mysql), no confidence intervals have been calculated as the goal was not to prove that A is better over B but only verifies that perfield does not degrade the experience in noticeable manner. I'd be happy to update the notebook with some confidence intervals if someone shows me how to that.

A total of 6,046,684 events have been collected between 2012-12-09T00:00:00Z and 2012-12-16T00:00:00Z,

I think you meant 2020-12, not 2012-12.

Other than that, this looks good to me. Nothing is obviously broken, so let's go for it.

(I do wish our A/B testing infrastructure had not degraded, but that is a task for another ticket.)

Change 658921 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] Swith to perfield builder for spaceless languages

https://gerrit.wikimedia.org/r/658921

Change 658921 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] Swith to perfield builder for spaceless languages

https://gerrit.wikimedia.org/r/658921

Mentioned in SAL (#wikimedia-operations) [2021-01-28T12:07:10Z] <dcausse@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T266027: [cirrus] Swith to perfield builder for spaceless languages (duration: 01m 06s)