Lang ID Eval Set for Dutch
Closed, ResolvedPublic

Description

Can't work on all of them at once, so continue down the list. See parent task T121541.

Work on this one got stalled after Dutch, so I've converted this ticket to just cover Dutch, and devolve the others (Polish, Arabic, Chinese) back to the parent task.

Related Objects

TJones created this task.Aug 4 2016, 8:57 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptAug 4 2016, 8:57 PM
TJones added a comment.Sep 8 2016, 8:53 PM

The analysis for Dutch is done. There were many languages present, but the final list ended up fairly short. Many of the languages present were in small numbers, and ended up with too many false positives. I'm hoping the work on improving confidence (T140289) will help here.

EBernhardson added a subscriber: EBernhardson.EditedOct 25 2016, 5:32 PM

https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki

Note that only Dutch (as mentioned above) is complete, the others will be added as they are completed.

Update: The rest of this task (Polish, Arabic, and Chinese) is on hold while I'm working on general improvements to TextCat (See T140289).

TJones added a project: Discovery-Search.
TJones moved this task to Up Next on the Discovery-Search board.
TJones moved this task from Up Next to This Quarter on the Discovery-Search board.
TJones moved this task from This Quarter to Up Next on the Discovery-Search board.Nov 15 2016, 6:32 PM
TJones changed the task status from "Open" to "Stalled".Nov 15 2016, 6:34 PM

Change 334729 had a related patch set uploaded (by Tjones):
Deploy TextCat Improvements

https://gerrit.wikimedia.org/r/334729

Change 334729 merged by jenkins-bot:
Deploy TextCat Improvements

https://gerrit.wikimedia.org/r/334729

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2017-02-08T00:16:27Z] <thcipriani@tin> Synchronized wmf-config: SWAT: [[gerrit:334729|Deploy TextCat Improvements]] T149324 T142140 (duration: 00m 45s)

TJones changed the title from "Lang ID Eval Sets for Polish, Arabic, Chinese, and Dutch" to "Lang ID Eval Sets for Dutch".Feb 8 2017, 4:12 PM
TJones edited the task description. (Show Details)
TJones removed a project: Chinese-Sites.
TJones moved this task from Up Next to Current work on the Discovery-Search board.
TJones moved this task from Backlog to Done on the Discovery-Search (Current work) board.
TJones changed the title from "Lang ID Eval Sets for Dutch" to "Lang ID Eval Set for Dutch".
TJones added a comment.Feb 8 2017, 4:16 PM

It's live and working! As example query in English on the Dutch Wikipedia, giving results in English: Phonetically Intuitive English

Deskana closed this task as "Resolved".Feb 10 2017, 5:22 PM