Page MenuHomePhabricator

Take advantage of word2vec signal in all models
Open, NormalPublic

Description

Some ideas for how to incorporate word2vec in our models, now that it's available in production:

  • Delta between mean embedding of previous revision vs current for editquality
  • Maybe the mean of words added
  • Mean embedding for draftquality
  • Mean for articlequality
  • Could look at statement labels included in the edit comment for itemquality

Make sure that the embedding data is only loaded into memory once.

Event Timeline

awight created this task.Jun 12 2018, 3:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2018, 3:02 PM

vectorizers.word2vec.load_kv is currently initialized in the drafttopic feature list, I think we would end up creating multiples of the data if we use this approach. Maybe we need to push down to revscoring?

@awight: Could you please associate a project tag? Thanks!

awight updated the task description. (Show Details)
Vvjjkkii renamed this task from Take advantage of word2vec signal in all models to w6aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from w6aaaaaaaa to Take advantage of word2vec signal in all models.Jul 2 2018, 2:05 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
awight removed a subscriber: awight.Mar 21 2019, 4:03 PM
Harej triaged this task as Normal priority.Apr 3 2019, 5:11 AM