Maniphest T197007

Take advantage of word2vec signal in all models
Open, MediumPublic
Actions

Assigned To

None

Authored By

	awight
	Jun 12 2018, 3:02 PM

Description

Some ideas for how to incorporate word2vec in our models, now that it's available in production:

Delta between mean embedding of previous revision vs current for editquality
Maybe the mean of words added
Mean embedding for draftquality
Mean for articlequality
Could look at statement labels included in the edit comment for itemquality

Make sure that the embedding data is only loaded into memory once.

Event Timeline

awight created this task.Jun 12 2018, 3:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2018, 3:02 PM

vectorizers.word2vec.load_kv is currently initialized in the drafttopic feature list, I think we would end up creating multiples of the data if we use this approach. Maybe we need to push down to revscoring?

@awight: Could you please associate a project tag? Thanks!

Framawiki subscribed.Jun 12 2018, 7:09 PM

Aklapper added projects: artificial-intelligence, revscoring, Machine-Learning-Team.Jun 13 2018, 8:36 AM

@Aklapper thanks!

awight edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Jun 13 2018, 11:29 PM

awight updated the task description. (Show Details)

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).Jun 18 2018, 1:48 PM

awight moved this task from Unsorted to New development on the Machine-Learning-Team board.Jun 20 2018, 2:48 PM

awight moved this task from New development to Research & analysis on the Machine-Learning-Team board.

• Vvjjkkii renamed this task from Take advantage of word2vec signal in all models to w6aaaaaaaa.Jul 1 2018, 1:04 AM

• Vvjjkkii triaged this task as High priority.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from w6aaaaaaaa to Take advantage of word2vec signal in all models.Jul 2 2018, 2:05 PM

CommunityTechBot raised the priority of this task from High to Needs Triage.

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

awight unsubscribed.Mar 21 2019, 4:03 PM

Halfak edited projects, added Machine-Learning-Team (Research); removed Machine-Learning-Team.Apr 2 2019, 9:33 PM

Restricted Application edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Research). · View Herald TranscriptApr 2 2019, 9:33 PM

Harej moved this task from Research & analysis to New development on the Machine-Learning-Team board.Apr 3 2019, 1:55 AM

Harej triaged this task as Medium priority.Apr 3 2019, 5:11 AM

He7d3r subscribed.Aug 18 2020, 4:38 PM

• ACraze moved this task from New development to Backlog/ORES on the Machine-Learning-Team board.Jan 19 2021, 10:40 PM

Take advantage of word2vec signal in all modelsOpen, MediumPublicActions

Description

Event Timeline

Take advantage of word2vec signal in all models
Open, MediumPublic
Actions