Page MenuHomePhabricator

Investigate word2vec memory issues with multiprocessing
Closed, ResolvedPublic

Description

word2vec binary when loaded in revscoring and used for feature extraction blows up memory during multiprocessing.

Event Timeline

Test code for benchmarking vectorizers with a global keyed_vector in the vectorizers file( https://gist.github.com/codez266/bde0d2384ef1cda0e105b8f59d25524a#file-vectors_only_once-py-L21 ):

from multiprocessing import Pool, cpu_count
import functools
from revscoring.dependencies import solve
from revscoring.datasources.meta import vectorizers
from revscoring.features.meta import aggregators
from revscoring.languages import english
#from revscoring.languages.english_vectors import google_news_kvs
from revscoring.datasources import revision_oriented

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s %(levelname)s:%(name)s -- %(message)s'
)

google_news_kvs = \
vectorizers.word2vec.load_kv(path="~/ai/GoogleNews-vectors-negative300.bin",limit=150000)
revision_text_vectors = vectorizers.word2vec(
  english.stopwords.revision.datasources.non_stopwords,
  google_news_kvs,
  name="revision.text.google_news_vectors")

w2v = aggregators.mean(
    revision_text_vectors,
    vector=True,
    name="revision.text.google_news_vector_mean"
)
observations = [0]*93000
def extract(w2v, obs):
    return 0

extractor_pool = Pool(processes=8)
partial_ext = functools.partial(extract, w2v)
for ob in extractor_pool.imap(partial_ext, observations):
    continue

Time taken: a minute

mprof --include-children:

MEM 0.671875 1520664541.5992
MEM 13.964844 1520664541.7175
MEM 24.906250 1520664541.8284
MEM 30.460938 1520664541.9451
MEM 32.031250 1520664542.0573
MEM 41.250000 1520664542.1671
MEM 53.296875 1520664542.2793
MEM 55.300781 1520664542.3903
MEM 62.148438 1520664542.5011
MEM 68.832031 1520664542.6157
MEM 75.375000 1520664542.7270
MEM 83.457031 1520664542.8391
MEM 93.613281 1520664542.9516
MEM 97.308594 1520664543.0618
MEM 105.589844 1520664543.1797
MEM 119.140625 1520664543.2919
MEM 131.234375 1520664543.4045
MEM 144.152344 1520664543.5159
MEM 156.621094 1520664543.6289
MEM 162.933594 1520664543.7414
MEM 177.074219 1520664543.8492
MEM 188.281250 1520664543.9601
MEM 201.695312 1520664544.0676
MEM 213.851562 1520664544.1774
MEM 226.156250 1520664544.2878
MEM 238.304688 1520664544.3973
MEM 241.937500 1520664544.5094
MEM 256.722656 1520664544.6160
MEM 267.246094 1520664544.7292
MEM 275.457031 1520664544.8402
MEM 288.554688 1520664544.9512
MEM 301.195312 1520664545.0634
MEM 313.609375 1520664545.1732
MEM 318.574219 1520664545.2851
MEM 329.503906 1520664545.3958
MEM 342.898438 1520664545.5066
MEM 2953.574219 1520664545.6208
...
MEM 2962.207031 1520664553.7411
MEM 2962.207031 1520664553.8525
MEM 2962.207031 1520664553.9697
MEM 348.371094 1520664564.0210
MEM 348.382812 1520664564.1369
MEM 348.453125 1520664564.2461
MEM 347.597656 1520664564.3561
MEM 347.597656 1520664564.4680

Caps at 3GB

Test code for benchmarking using word2vec as an external module contained in english_vectors:

from multiprocessing import Pool, cpu_count
import functools
from revscoring.dependencies import solve
from revscoring.datasources.meta import vectorizers
from revscoring.features.meta import aggregators
from revscoring.languages import english
from revscoring.languages.english_vectors import google_news_kvs
from revscoring.datasources import revision_oriented

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s %(levelname)s:%(name)s -- %(message)s'
)

revision_text_vectors = vectorizers.word2vec(
  english.stopwords.revision.datasources.non_stopwords,
  google_news_kvs,
  name="revision.text.google_news_vectors")

w2v = aggregators.mean(
    revision_text_vectors,
    vector=True,
    name="revision.text.google_news_vector_mean"
)
observations = [0]*93000
def extract(w2v, obs):
    return 0

extractor_pool = Pool(processes=8)
partial_ext = functools.partial(extract, w2v)
for ob in extractor_pool.imap(partial_ext, observations):
    continue

english_vectors.py

from revscoring.datasources.meta import vectorizers

google_news_kvs = \
vectorizers.word2vec.load_kv(path="~/ai/GoogleNews-vectors-negative300.bin", limit=150000)

Time taken: had to kill after several minutes.

mprof --include-children:

CMDLINE python buggy.py
MEM 0.691406 1520664298.7699
MEM 20.257812 1520664298.8795
MEM 30.968750 1520664298.9889
MEM 32.023438 1520664299.0955
MEM 45.605469 1520664299.2148
MEM 53.628906 1520664299.3247
MEM 56.203125 1520664299.4357
MEM 66.789062 1520664299.5455
MEM 72.941406 1520664299.6550
MEM 81.031250 1520664299.7642
MEM 94.562500 1520664299.8761
MEM 98.843750 1520664299.9880
MEM 113.078125 1520664300.0975
MEM 127.144531 1520664300.2077
MEM 141.558594 1520664300.3175
MEM 155.289062 1520664300.4290
MEM 162.789062 1520664300.5408
MEM 176.656250 1520664300.6504
MEM 188.574219 1520664300.7610
MEM 202.207031 1520664300.8709
MEM 215.433594 1520664300.9779
MEM 227.468750 1520664301.0935
MEM 239.464844 1520664301.2127
MEM 245.457031 1520664301.3217
MEM 257.070312 1520664301.4297
MEM 268.863281 1520664301.5375
MEM 278.773438 1520664301.6472
MEM 289.964844 1520664301.7644
MEM 302.089844 1520664301.8724
MEM 313.738281 1520664301.9812
MEM 318.425781 1520664302.0925
MEM 325.191406 1520664302.2028
MEM 337.183594 1520664302.3125
MEM 672.968750 1520664302.4197
MEM 2987.976562 1520664302.5339
MEM 3107.085938 1520664302.6463
MEM 3225.644531 1520664302.7572
...
MEM 7812.152344 1520664349.4247
MEM 7813.441406 1520664349.5319
MEM 7814.472656 1520664349.6468
MEM 7644.578125 1520664349.7577
MEM 7763.824219 1520664349.8672
MEM 7510.433594 1520664349.9777
MEM 7616.394531 1520664350.0890
MEM 7427.886719 1520664350.1973
MEM 7500.371094 1520664350.3056
MEM 7743.484375 1520664350.4149
MEM 7806.648438 1520664350.5271

Caps at 8GB, more than double of previous case.

wow! So strange. Nice work on the test. Let's figure out a clean way to use a global. :)

See also T182350: Profile ORES code memory use, optimize pre-fork allocation, I think you're onto something important and we can learn from this, and use it to fix other places where we fail to use copy-on-write. @Sumit please link to the code changes you're making that seem to improve memory sharing.

@Sumit please link to the code changes you're making that seem to improve memory sharing.

Refer to the gist in the first comment for the code changes that make it multiprocessing friendly.

I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory

TL;DR: it didn't work. I get the exact same output for both strategies!

I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory

TL;DR: it didn't work. I get the exact same output for both strategies!

@Halfak you aren't quite replicating the setup. import_processor is still using the global demo_kv since individual scripts are referring to the same gloabal demo_kv.
I've added changes in https://github.com/halfak/demo_shared_memory/pull/1 that make import_processor use demo_kv as an argument similar to our setup of passing vectors in features objects as self.kv=vectors.
You'll see the magic ;)

awight mentioned this in Unknown Object (Phame Post).May 2 2018, 6:41 PM
awight mentioned this in Unknown Object (Phame Post).
awight mentioned this in Unknown Object (Phame Post).