Experiment with different vector lengths for ar, cs, en, and kowiki topic models.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Oct 10 2019, 3:12 PM

Related Objects
Search...

Status	Assigned	Task
Resolved	Halfak	T243451 Deploy ORES -- Late Jan 2020
Resolved	Halfak	T235181 Build WikiProject directory topic models for ar, cs, and kowiki
Resolved	Halfak	T235183 Experiment with different vector lengths for ar, cs, en, and kowiki topic models.
Resolved	Halfak	T235184 Generate word vectors for ar, cs, en, and ko using FastText
Resolved	Isaac	T242013 Implement native NN model in revscoring
Resolved	Halfak	T235187 Create labeled data for topic models in ar, cs, kowiki
Resolved	Isaac	T236713 Improve drafttopic training data pipeline
Resolved	Isaac	T240273 Extract cross-wiki WikiProject tags
Resolved	Halfak	T240286 Re-train English Wikipedia topic model using new WikiProject Taxonomy
Resolved	Halfak	T240276 Restructure WikiProject directory to be better
Resolved	kevinbazira	T240282 Improve WikiProject template --> WikiProject mapping

Halfak triaged this task as High priority.Oct 23 2019, 9:07 PM

I found https://fasttext.cc/docs/en/unsupervised-tutorial.html. It seems like a great tutorial for generating word vectors. I think we should start here with length 50 vectors and compare them to length 100 vectors.

We should set up a job on stat1007 or maybe even a hadoop job to clean up the text of XML dumps and then generate vectors from them.

Halfak closed this task as Resolved.Feb 12 2020, 9:56 PM

Halfak claimed this task.