|Resolved||Halfak||T243451 Deploy ORES -- Late Jan 2020|
|Resolved||Halfak||T235181 Build WikiProject directory topic models for ar, cs, and kowiki|
|Resolved||Halfak||T235183 Experiment with different vector lengths for ar, cs, en, and kowiki topic models.|
|Resolved||Halfak||T235184 Generate word vectors for ar, cs, en, and ko using FastText|
|Open||None||T242013 Implement native NN model in revscoring|
|Resolved||Halfak||T235187 Create labeled data for topic models in ar, cs, kowiki|
|Resolved||Isaac||T236713 Improve drafttopic training data pipeline|
|Resolved||Isaac||T240273 Extract cross-wiki WikiProject tags|
|Resolved||Halfak||T240286 Re-train English Wikipedia topic model using new WikiProject Taxonomy|
|Resolved||Halfak||T240276 Restructure WikiProject directory to be better|
|Resolved||kevinbazira||T240282 Improve WikiProject template --> WikiProject mapping|
I found https://fasttext.cc/docs/en/unsupervised-tutorial.html. It seems like a great tutorial for generating word vectors. I think we should start here with length 50 vectors and compare them to length 100 vectors.
We should set up a job on stat1007 or maybe even a hadoop job to clean up the text of XML dumps and then generate vectors from them.