Maniphest T155672

Deploy 10K models for TextCat (PHP & Perl)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TJones
	Jan 18 2017, 9:54 PM

Tags

Referenced Files

None

Subscribers

Description

Recent work (subtasks of T140289) has shown that we can get better performance out of TextCat with larger language models. We currently have 5K models (configured to use as 3K) in production. 9K models seem to be the best option, but we can deploy the 10K models I've been using for testing and development.

Details

	Subject	Repo	Branch	Lines +/-
	Update PHP TextCat Models to 10K n-grams	wikimedia/textcat	master	+1 M -79

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T118278 [EPIC] Improve Language Identification for use in Cirrus Search
		Resolved		TJones	T140289 Investigate Improvements and Confidence Measures for TextCat Language Detection
		Resolved		TJones	T149324 TextCat Improvement Deployment
		Resolved		TJones	T155672 Deploy 10K models for TextCat (PHP & Perl)

Event Timeline

TJones created this task.Jan 18 2017, 9:54 PM

TJones edited projects, added Discovery-Search (Current work); removed Discovery-Search.Jan 18 2017, 9:57 PM

TJones moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

Perl models are updated: LM and LM-query.

Change 333683 had a related patch set uploaded (by Tjones):
Update PHP TextCat Models to 10K n-grams

https://gerrit.wikimedia.org/r/333683

gerritbot added a project: Patch-For-Review.Jan 23 2017, 4:53 PM

TJones moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Jan 24 2017, 6:09 PM

Change 333683 merged by jenkins-bot:
Update PHP TextCat Models to 10K n-grams

https://gerrit.wikimedia.org/r/333683

TJones moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Jan 24 2017, 8:09 PM

This has been deployed to the TextCat repository and will be deployed out into production this week with other deployments.