Rather than a quantitative score, develop a qualitative score that puts scores into higher or lower buckets, based on what happens during processing. E.g., being the host language is a better bucket; being a known poorly performing language is a worse bucket. Other factors include having a relatively unambiguous script, having other factors re-order scores, etc.
|Open||None||T118278 [EPIC] Improve Language Identification for use in Cirrus Search|
|Resolved||TJones||T140289 Investigate Improvements and Confidence Measures for TextCat Language Detection|
|Resolved||TJones||T149323 Qualitative confidence score for TextCat|