Page MenuHomePhabricator

Specify which languages TextCat should use
Closed, ResolvedPublic

Description

Right now, you can only tell TextCat which languages to use by specifying a directory. We don't always want to use every language available. We don't want to duplicate language models into custom directories just to specify particular sets of languages. The directories should be for different kinds of models (query-based vs text-based), not different sets of models.

TextCat should be able to take a list of languages and use only those, from the specified directory.

Event Timeline

TJones created this task.Feb 18 2016, 5:27 PM

TextCat library already supports this:

	public function classify( $text, $candidates = null )

but TextCat detector implementation in Cirrus does not. I'll try to add something.

Change 271719 had a related patch set uploaded (by Smalyshev):
Add detection limits for textcat

https://gerrit.wikimedia.org/r/271719

Smalyshev moved this task from Needs triage to Search on the Discovery board.Feb 19 2016, 8:28 PM

Change 271719 merged by jenkins-bot:
Add detection limits for textcat

https://gerrit.wikimedia.org/r/271719

Smalyshev closed this task as Resolved.Feb 25 2016, 9:36 PM