Page MenuHomePhabricator

Experiment with different vector lengths for ar, cs, en, and kowiki topic models.
Closed, ResolvedPublic

Event Timeline

Halfak created this task.Oct 10 2019, 3:12 PM
Halfak triaged this task as High priority.Oct 23 2019, 9:07 PM
Halfak moved this task from Untriaged to New development on the Scoring-platform-team board.
Halfak added a comment.Dec 9 2019, 4:34 PM

I found https://fasttext.cc/docs/en/unsupervised-tutorial.html. It seems like a great tutorial for generating word vectors. I think we should start here with length 50 vectors and compare them to length 100 vectors.

We should set up a job on stat1007 or maybe even a hadoop job to clean up the text of XML dumps and then generate vectors from them.

Halfak closed this task as Resolved.Feb 12 2020, 9:56 PM
Halfak claimed this task.
Halfak moved this task from Active to Done on the Scoring-platform-team (Current) board.