The word2vec data is a 1.5GB file, that we'll need to deploy to all ORES compute nodes. The timeline for us using files of this size is:
* We want the word2vec data ASAP, since it's blocking a production model.
* Might never need to update that file, but we'll probably need it installed for years to come.
* I expect that we'll have our own "embeddings" files of similar size, within a year.
Options for deployment:
# Deploy as a .deb.
** A patch is prepared as https://phabricator.wikimedia.org/source/word2vec/
** @akosiaris has recommended against this in T187217#4005891, because it will slow down or break provisioning machines.
# Deploy in the ORES git repo
** Absolutely not, this would make our repo unusable.
# Deploy as its own git-lfs repo
** Scap might be ready to handle git-lfs
** @awight is not looking forward to being the rocket dog, first adopter for git-lfs in scap. We don't want to block deployment waiting for this kind of insfrastructure work, if it's not mature.