I'm investigating memory usage. I'm working from a python terminal on my dev laptop. Essentially, I'm tracking VSZ and RSS while running commands.
Before loading anything:
- VSZ: 35600
- RSS: 9340
After from revscoring import Model:
- VSZ: 495752
- RSS: 76216
After enwiki = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))
- VSZ: 1010852
- RSS: 567348
After arwiki = Model.load(open("models/arwiki.articletopic.gradient_boosting.model"))
- VSZ: 1385732
- RSS: 941856
After enwiki2 = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))
- VSZ: 1464596
- RSS: 1020768
This is higher memory usage than I think we are really prepared for. After loading all of the models, it ends up being about 3x as much memory as we needed before. As we can see from the final load, that memory gets shared relatively straightforwardly, but it is still too much.
I wonder if we can use gensim's memory-map mode to get around this. Alternatively, we can reduce the dimensions of our vectors or reduce the size of the vocabulary.