Page MenuHomePhabricator

Deploy ORES -- Late Jan 2020
Closed, ResolvedPublic

Event Timeline

Change 566595 had a related patch set uploaded (by Halfak; owner: Halfak):
[mediawiki/services/ores/deploy@master] Adds topic models for ar, cs, ko, and vi.

https://gerrit.wikimedia.org/r/566595

Change 566595 merged by Accraze:
[mediawiki/services/ores/deploy@master] Adds topic models for ar, cs, ko, and vi.

https://gerrit.wikimedia.org/r/566595

I failed on the deploy to beta. Looks like memory usage is far too high. I'm investigating.

I'm investigating memory usage. I'm working from a python terminal on my dev laptop. Essentially, I'm tracking VSZ and RSS while running commands.

Before loading anything:

  • VSZ: 35600
  • RSS: 9340

After from revscoring import Model:

  • VSZ: 495752
  • RSS: 76216

After enwiki = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1010852
  • RSS: 567348

After arwiki = Model.load(open("models/arwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1385732
  • RSS: 941856

After enwiki2 = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1464596
  • RSS: 1020768

This is higher memory usage than I think we are really prepared for. After loading all of the models, it ends up being about 3x as much memory as we needed before. As we can see from the final load, that memory gets shared relatively straightforwardly, but it is still too much.

I wonder if we can use gensim's memory-map mode to get around this. Alternatively, we can reduce the dimensions of our vectors or reduce the size of the vocabulary.

Change 567120 had a related patch set uploaded (by Halfak; owner: Halfak):
[research/ores/wheels@master] Updates for revscoring 2.6.5

https://gerrit.wikimedia.org/r/567120

Change 567120 merged by Accraze:
[research/ores/wheels@master] Updates for revscoring 2.6.5

https://gerrit.wikimedia.org/r/567120

Change 567143 had a related patch set uploaded (by Halfak; owner: Halfak):
[mediawiki/services/ores/deploy@master] New draft topic models with 50d vectors.

https://gerrit.wikimedia.org/r/567143

Change 567143 merged by Halfak:
[mediawiki/services/ores/deploy@master] New draft topic models with 50d vectors.

https://gerrit.wikimedia.org/r/567143

Mentioned in SAL (#wikimedia-operations) [2020-02-03T21:01:24Z] <halfak@deploy1001> Started deploy [ores/deploy@50a101a]: T243451

Mentioned in SAL (#wikimedia-operations) [2020-02-03T21:14:09Z] <halfak@deploy1001> Finished deploy [ores/deploy@50a101a]: T243451 (duration: 12m 47s)