Details
Event Timeline
Change 566595 had a related patch set uploaded (by Halfak; owner: Halfak):
[mediawiki/services/ores/deploy@master] Adds topic models for ar, cs, ko, and vi.
Change 566595 merged by Accraze:
[mediawiki/services/ores/deploy@master] Adds topic models for ar, cs, ko, and vi.
I failed on the deploy to beta. Looks like memory usage is far too high. I'm investigating.
I'm investigating memory usage. I'm working from a python terminal on my dev laptop. Essentially, I'm tracking VSZ and RSS while running commands.
Before loading anything:
- VSZ: 35600
- RSS: 9340
After from revscoring import Model:
- VSZ: 495752
- RSS: 76216
After enwiki = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))
- VSZ: 1010852
- RSS: 567348
After arwiki = Model.load(open("models/arwiki.articletopic.gradient_boosting.model"))
- VSZ: 1385732
- RSS: 941856
After enwiki2 = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))
- VSZ: 1464596
- RSS: 1020768
This is higher memory usage than I think we are really prepared for. After loading all of the models, it ends up being about 3x as much memory as we needed before. As we can see from the final load, that memory gets shared relatively straightforwardly, but it is still too much.
I wonder if we can use gensim's memory-map mode to get around this. Alternatively, we can reduce the dimensions of our vectors or reduce the size of the vocabulary.
Change 567120 had a related patch set uploaded (by Halfak; owner: Halfak):
[research/ores/wheels@master] Updates for revscoring 2.6.5
Change 567120 merged by Accraze:
[research/ores/wheels@master] Updates for revscoring 2.6.5
Change 567143 had a related patch set uploaded (by Halfak; owner: Halfak):
[mediawiki/services/ores/deploy@master] New draft topic models with 50d vectors.
Change 567143 merged by Halfak:
[mediawiki/services/ores/deploy@master] New draft topic models with 50d vectors.
Mentioned in SAL (#wikimedia-operations) [2020-02-03T21:01:24Z] <halfak@deploy1001> Started deploy [ores/deploy@50a101a]: T243451
Mentioned in SAL (#wikimedia-operations) [2020-02-03T21:14:09Z] <halfak@deploy1001> Finished deploy [ores/deploy@50a101a]: T243451 (duration: 12m 47s)