Can't work on all of them at once, so start with the top of the list by query volume. See parent task T121541.
Description
Description
Event Timeline
Comment Actions
In progress: frwiki and es wiki done: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization_for_frwiki_eswiki_itwiki_and_dewiki
Comment Actions
Analysis complete. Summary of Results:
Using the default 3K models, the best options for each wiki are presented below:
frwiki
- languages: French, English, Arabic, Russian, Chinese, Armenian, Thai, Greek, Hebrew, Korean
- lang codes: fr, en, ar, ru, zh, th, el, hy, he, ko
- relevant poor-performing queries: 29%
- f0.5: 89.0%
eswiki
- languages: Spanish, English, Russian, Chinese, Arabic, Japanese
- lang codes: es, en, ru, zh, ar, ja
- relevant poor-performing queries: 47%
- f0.5: 95.8%
itwiki
- languages: Italian, English, Russian, Arabic, Chinese, Japanese, Greek, Korean
- lang codes: it, en, ru, ar, zh, ja, el, ko
- relevant poor-performing queries: 29%
- f0.5: 92.2%
dewiki
- languages: German, English, Chinese, Greek, Russian, Arabic, Hindi, Thai, Korean, Japanese
- lang codes: de, en, zh, el, ru, ar, hi, th, ko, ja
- relevant poor-performing queries: 35%
- f0.5: 88.2%
Comment Actions
I read through your report, and very impressed with the improvement in precision by using language sets tailored to the wiki's. Over 90% for french, spanish and italian. And awfully close to 90% for german (and well above the baseline of all models). The analysis all looks good to me, i think we can call this done and use the results in future tests/prod deployment.