Page MenuHomePhabricator

Do an A/B Tests on Other Wikis with TextCat for Language Identification
Closed, ResolvedPublic

Description

Do A/B tests or A/B/C tests with TextCat on other wikis (depends on T121541). See T121542 for more details.

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedEBernhardson
Resolveddebt
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedTJones
Resolveddebt
ResolvedAnikethfoss
ResolvedTJones
Resolveddebt
ResolvedSmalyshev
ResolvedTJones
ResolvedTJones
Resolved dpatrick
ResolvedEBernhardson

Event Timeline

TJones raised the priority of this task from to Needs Triage.
TJones updated the task description. (Show Details)
TJones added a project: CirrusSearch.
TJones subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
Deskana moved this task from Needs triage to On Sprint Board on the Discovery-ARCHIVED board.
Deskana subscribed.

Change 260164 had a related patch set uploaded (by Smalyshev):
Add implementation for TextCat language detection

https://gerrit.wikimedia.org/r/260164

Change 260164 merged by jenkins-bot:
Add implementation for TextCat language detection

https://gerrit.wikimedia.org/r/260164

@TJones I was looking at these tasks, and wondering if the blockers here are really blockers for running an extra A/B test. @EBernhardson and I think they may not be, and that based on your work so far that we could run a test right now, but we don't know this stuff as well as you do, so we'd like to ask you. Thoughts?

Deskana raised the priority of this task from Medium to High.May 24 2016, 10:18 PM

Let's bump up the priority somewhat.

@TJones I was looking at these tasks, and wondering if the blockers here are really blockers for running an extra A/B test. @EBernhardson and I think they may not be, and that based on your work so far that we could run a test right now, but we don't know this stuff as well as you do, so we'd like to ask you. Thoughts?

They are and they aren't—what a helpful answer!

The tasks are really too general, and at the earliest stage I divided everything into English and not-English until we figured out whether it made sense to pursue language ID in general.

The specific blocking tasks do need to be done, but not for all languages at once. For French, Spanish, Italian, and German Wikipedias, we aren't blocked by T121541 specifically, but by the subtask T132466, which is in "needs review", but is basically done.

The language lists for each of those wikis is available in Phab ticket T132466, and that's enough to run the A/B tests parallel to the test we've run for enwiki.

There's still the question of recall-focus vs precision-focus (see T134431 ("needs review", but basically done) and T136034 (to do)), but we can do all the A/B tests with the same precision-focus we've had so far and get a better idea of how well this can work.

Change 293432 had a related patch set uploaded (by EBernhardson):
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/293432

Change 293432 merged by jenkins-bot:
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/293432

Change 294773 had a related patch set uploaded (by EBernhardson):
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/294773

Change 294773 merged by jenkins-bot:
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/294773