Page MenuHomePhabricator

Generate wikitext-based and query-based language models for TextCat
Closed, ResolvedPublic

Description

We need to generate two sets of data models for TextCat - one based on wikitext for top 50 languages (based on number of speakers) and one based on queries for top 50 (based on logs or same as above?)

Related Objects

StatusAssignedTask
ResolvedEBernhardson
Declinedmpopov
ResolvedEBernhardson
Resolvedmpopov
ResolvedEBernhardson
Resolveddebt
OpenNone
ResolvedEBernhardson
ResolvedEBernhardson
ResolvedEBernhardson
Resolveddebt
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedTJones
Resolveddebt
ResolvedAnikethfoss
ResolvedTJones
Resolveddebt
ResolvedSmalyshev
ResolvedTJones
ResolvedTJones
Resolved dpatrick
ResolvedEBernhardson

Event Timeline

Smalyshev assigned this task to TJones.
Smalyshev raised the priority of this task from to Medium.
Smalyshev updated the task description. (Show Details)
Smalyshev added a project: Discovery.
Smalyshev added a subscriber: Smalyshev.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 13 2016, 7:10 PM
Smalyshev moved this task from Needs triage to Search on the Discovery board.Jan 13 2016, 7:11 PM
Deskana moved this task from Search to On Sprint Board on the Discovery board.Jan 14 2016, 5:41 PM

This is at least related to T121545, though some of the details are different.

TJones added a comment.Feb 2 2016, 5:57 PM

This is very similar to T121545. I think they should be merged.

At the moment, I've created models based on lightly cleaned up WikiText, but haven't evaluated them. They have been committed and submitted for review, too.

The mentioned patch is merged, so i'm calling this complete. It could also just be merged as duplicate like suggested by trey

Smalyshev closed this task as Resolved.Feb 3 2016, 1:08 AM