Page MenuHomePhabricator

Lang ID Eval Sets for Italian, German, Spanish, and French
Closed, ResolvedPublic


Can't work on all of them at once, so start with the top of the list by query volume. See parent task T121541.

Related Objects

Event Timeline

TJones created this task.Apr 12 2016, 3:49 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptApr 12 2016, 3:49 PM
TJones renamed this task from Land ID Eval Sets for Italian, German, Spanish, and French to Lang ID Eval Sets for Italian, German, Spanish, and French.Apr 12 2016, 3:50 PM

itwiki is done. Same URL as above.

TJones added a comment.May 3 2016, 2:27 PM

Analysis complete. Summary of Results:

Using the default 3K models, the best options for each wiki are presented below:


  • languages: French, English, Arabic, Russian, Chinese, Armenian, Thai, Greek, Hebrew, Korean
  • lang codes: fr, en, ar, ru, zh, th, el, hy, he, ko
  • relevant poor-performing queries: 29%
  • f0.5: 89.0%


  • languages: Spanish, English, Russian, Chinese, Arabic, Japanese
  • lang codes: es, en, ru, zh, ar, ja
  • relevant poor-performing queries: 47%
  • f0.5: 95.8%


  • languages: Italian, English, Russian, Arabic, Chinese, Japanese, Greek, Korean
  • lang codes: it, en, ru, ar, zh, ja, el, ko
  • relevant poor-performing queries: 29%
  • f0.5: 92.2%


  • languages: German, English, Chinese, Greek, Russian, Arabic, Hindi, Thai, Korean, Japanese
  • lang codes: de, en, zh, el, ru, ar, hi, th, ko, ja
  • relevant poor-performing queries: 35%
  • f0.5: 88.2%

I read through your report, and very impressed with the improvement in precision by using language sets tailored to the wiki's. Over 90% for french, spanish and italian. And awfully close to 90% for german (and well above the baseline of all models). The analysis all looks good to me, i think we can call this done and use the results in future tests/prod deployment.

debt closed this task as Resolved.Jun 8 2016, 12:38 AM
debt added a subscriber: debt.

Looks like this is resolved - closing.