Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Halfak | T243451 Deploy ORES -- Late Jan 2020 | |||
Resolved | Halfak | T235181 Build WikiProject directory topic models for ar, cs, and kowiki | |||
Resolved | Halfak | T235183 Experiment with different vector lengths for ar, cs, en, and kowiki topic models. | |||
Resolved | Halfak | T235184 Generate word vectors for ar, cs, en, and ko using FastText | |||
Resolved | Isaac | T242013 Implement native NN model in revscoring | |||
Resolved | Halfak | T235187 Create labeled data for topic models in ar, cs, kowiki | |||
Resolved | Isaac | T236713 Improve drafttopic training data pipeline | |||
Resolved | Isaac | T240273 Extract cross-wiki WikiProject tags | |||
Resolved | Halfak | T240286 Re-train English Wikipedia topic model using new WikiProject Taxonomy | |||
Resolved | Halfak | T240276 Restructure WikiProject directory to be better | |||
Resolved | kevinbazira | T240282 Improve WikiProject template --> WikiProject mapping |
Event Timeline
Comment Actions
I found https://fasttext.cc/docs/en/unsupervised-tutorial.html. It seems like a great tutorial for generating word vectors. I think we should start here with length 50 vectors and compare them to length 100 vectors.
We should set up a job on stat1007 or maybe even a hadoop job to clean up the text of XML dumps and then generate vectors from them.