Page MenuHomePhabricator

Specify which languages TextCat should use
Closed, ResolvedPublic

Description

Right now, you can only tell TextCat which languages to use by specifying a directory. We don't always want to use every language available. We don't want to duplicate language models into custom directories just to specify particular sets of languages. The directories should be for different kinds of models (query-based vs text-based), not different sets of models.

TextCat should be able to take a list of languages and use only those, from the specified directory.

Event Timeline

TextCat library already supports this:

	public function classify( $text, $candidates = null )

but TextCat detector implementation in Cirrus does not. I'll try to add something.

Change 271719 had a related patch set uploaded (by Smalyshev):
Add detection limits for textcat

https://gerrit.wikimedia.org/r/271719

Change 271719 merged by jenkins-bot:
Add detection limits for textcat

https://gerrit.wikimedia.org/r/271719