Page MenuHomePhabricator

Lang ID Eval Set for Dutch
Closed, ResolvedPublic

Description

Can't work on all of them at once, so continue down the list. See parent task T121541.

Work on this one got stalled after Dutch, so I've converted this ticket to just cover Dutch, and devolve the others (Polish, Arabic, Chinese) back to the parent task.

Related Objects

Event Timeline

The analysis for Dutch is done. There were many languages present, but the final list ended up fairly short. Many of the languages present were in small numbers, and ended up with too many false positives. I'm hoping the work on improving confidence (T140289) will help here.

https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki

Note that only Dutch (as mentioned above) is complete, the others will be added as they are completed.

Update: The rest of this task (Polish, Arabic, and Chinese) is on hold while I'm working on general improvements to TextCat (See T140289).

TJones added a project: Discovery-Search.
TJones moved this task to Up Next on the Discovery-Search board.
TJones moved this task from Up Next to This Quarter on the Discovery-Search board.
TJones changed the task status from Open to Stalled.Nov 15 2016, 6:34 PM

Change 334729 had a related patch set uploaded (by Tjones):
Deploy TextCat Improvements

https://gerrit.wikimedia.org/r/334729

Change 334729 merged by jenkins-bot:
Deploy TextCat Improvements

https://gerrit.wikimedia.org/r/334729

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2017-02-08T00:16:27Z] <thcipriani@tin> Synchronized wmf-config: SWAT: [[gerrit:334729|Deploy TextCat Improvements]] T149324 T142140 (duration: 00m 45s)

TJones renamed this task from Lang ID Eval Sets for Polish, Arabic, Chinese, and Dutch to Lang ID Eval Sets for Dutch.Feb 8 2017, 4:12 PM
TJones renamed this task from Lang ID Eval Sets for Dutch to Lang ID Eval Set for Dutch.
TJones updated the task description. (Show Details)
TJones removed a project: Chinese-Sites.
TJones moved this task from Up Next to Current work on the Discovery-Search board.
TJones moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

It's live and working! As example query in English on the Dutch Wikipedia, giving results in English: Phonetically Intuitive English