Page MenuHomePhabricator

Take advantage of word2vec signal in all models
Open, MediumPublic

Description

Some ideas for how to incorporate word2vec in our models, now that it's available in production:

  • Delta between mean embedding of previous revision vs current for editquality
  • Maybe the mean of words added
  • Mean embedding for draftquality
  • Mean for articlequality
  • Could look at statement labels included in the edit comment for itemquality

Make sure that the embedding data is only loaded into memory once.

Event Timeline

vectorizers.word2vec.load_kv is currently initialized in the drafttopic feature list, I think we would end up creating multiples of the data if we use this approach. Maybe we need to push down to revscoring?

@awight: Could you please associate a project tag? Thanks!

Vvjjkkii renamed this task from Take advantage of word2vec signal in all models to w6aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from w6aaaaaaaa to Take advantage of word2vec signal in all models.Jul 2 2018, 2:05 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Harej triaged this task as Medium priority.Apr 3 2019, 5:11 AM