Add wrong-keyboard–transformed language models for "Cyrillic English" (en_cyr) and "Latin Russian" (ru_lat) to TextCat, both to the query-based (LM-query/) and wikitext-based (LM/) models. Also add Windows-1251 wrong-encoding model (ru_win1251) to the wikitext-based models.
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add Wrong-Keyboard and Wrong-Encoding Models to TextCat | wikimedia/textcat | master | +100 K -22 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T118278 [EPIC] Improve Language Identification for use in Cirrus Search | |||
Open | None | T138958 Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias | |||
Resolved | TJones | T213931 Update TextCat with wrong-keyboard models | |||
Declined | TJones | T213935 Revert changes to TextCat that add dependency on autoload.php |
Event Timeline
Comment Actions
Change 484752 had a related patch set uploaded (by Tjones; owner: Tjones):
[wikimedia/textcat@master] Add Wrong-Keyboard and Wrong-Encoding Models to TextCat
Comment Actions
Change 484752 merged by jenkins-bot:
[wikimedia/textcat@master] Add Wrong-Keyboard and Wrong-Encoding Models to TextCat