Page MenuHomePhabricator

Implement word2vec featurevector in revscoring
Closed, ResolvedPublic

Description

Should look something like this:

from revscoring.features import wikitext
from revscoring.datasources.meta import vectorizers
from revscoring.features.meta import aggregators

google_news_kvs = vectorizer.word2vec.load_kv("google_news.bin")

revision_text_vectors = vectorizer.word2vec(
  wikitext.revision.datasources.words,
  google_news_kvs,
  name="revision.text.google_news_vectors")

mean_revision_text_vector = aggregators.mean(
  revision_text_vectors,
  vector=True,  # We'll need to add this.  I think all of our aggregators could work on vectors.  Maybe it could auto detect that there's a vector involved.
  name="revision.text.google_news_vector")