Page MenuHomePhabricator

Experiment with different vector lengths for ar, cs, en, and kowiki topic models.
Closed, ResolvedPublic

Event Timeline

Halfak triaged this task as High priority.Oct 23 2019, 9:07 PM
Halfak moved this task from Unorganized to New development on the Machine-Learning-Team board.

I found https://fasttext.cc/docs/en/unsupervised-tutorial.html. It seems like a great tutorial for generating word vectors. I think we should start here with length 50 vectors and compare them to length 100 vectors.

We should set up a job on stat1007 or maybe even a hadoop job to clean up the text of XML dumps and then generate vectors from them.

Halfak claimed this task.
Halfak moved this task from Active to Done on the Machine-Learning-Team (Active Tasks) board.