word2vec binary when loaded in revscoring and used for feature extraction blows up memory during multiprocessing.
Description
Related Objects
- Mentioned In
- Blog Post: Status Update (May 2, 2018)
- Mentioned Here
- T182350: Profile ORES code memory use, optimize pre-fork allocation
Event Timeline
Test code for benchmarking vectorizers with a global keyed_vector in the vectorizers file( https://gist.github.com/codez266/bde0d2384ef1cda0e105b8f59d25524a#file-vectors_only_once-py-L21 ):
from multiprocessing import Pool, cpu_count import functools from revscoring.dependencies import solve from revscoring.datasources.meta import vectorizers from revscoring.features.meta import aggregators from revscoring.languages import english #from revscoring.languages.english_vectors import google_news_kvs from revscoring.datasources import revision_oriented logging.basicConfig( level=logging.DEBUG, format='%(asctime)s %(levelname)s:%(name)s -- %(message)s' ) google_news_kvs = \ vectorizers.word2vec.load_kv(path="~/ai/GoogleNews-vectors-negative300.bin",limit=150000) revision_text_vectors = vectorizers.word2vec( english.stopwords.revision.datasources.non_stopwords, google_news_kvs, name="revision.text.google_news_vectors") w2v = aggregators.mean( revision_text_vectors, vector=True, name="revision.text.google_news_vector_mean" ) observations = [0]*93000 def extract(w2v, obs): return 0 extractor_pool = Pool(processes=8) partial_ext = functools.partial(extract, w2v) for ob in extractor_pool.imap(partial_ext, observations): continue
Time taken: a minute
mprof --include-children:
MEM 0.671875 1520664541.5992 MEM 13.964844 1520664541.7175 MEM 24.906250 1520664541.8284 MEM 30.460938 1520664541.9451 MEM 32.031250 1520664542.0573 MEM 41.250000 1520664542.1671 MEM 53.296875 1520664542.2793 MEM 55.300781 1520664542.3903 MEM 62.148438 1520664542.5011 MEM 68.832031 1520664542.6157 MEM 75.375000 1520664542.7270 MEM 83.457031 1520664542.8391 MEM 93.613281 1520664542.9516 MEM 97.308594 1520664543.0618 MEM 105.589844 1520664543.1797 MEM 119.140625 1520664543.2919 MEM 131.234375 1520664543.4045 MEM 144.152344 1520664543.5159 MEM 156.621094 1520664543.6289 MEM 162.933594 1520664543.7414 MEM 177.074219 1520664543.8492 MEM 188.281250 1520664543.9601 MEM 201.695312 1520664544.0676 MEM 213.851562 1520664544.1774 MEM 226.156250 1520664544.2878 MEM 238.304688 1520664544.3973 MEM 241.937500 1520664544.5094 MEM 256.722656 1520664544.6160 MEM 267.246094 1520664544.7292 MEM 275.457031 1520664544.8402 MEM 288.554688 1520664544.9512 MEM 301.195312 1520664545.0634 MEM 313.609375 1520664545.1732 MEM 318.574219 1520664545.2851 MEM 329.503906 1520664545.3958 MEM 342.898438 1520664545.5066 MEM 2953.574219 1520664545.6208 ... MEM 2962.207031 1520664553.7411 MEM 2962.207031 1520664553.8525 MEM 2962.207031 1520664553.9697 MEM 348.371094 1520664564.0210 MEM 348.382812 1520664564.1369 MEM 348.453125 1520664564.2461 MEM 347.597656 1520664564.3561 MEM 347.597656 1520664564.4680
Caps at 3GB
Test code for benchmarking using word2vec as an external module contained in english_vectors:
from multiprocessing import Pool, cpu_count import functools from revscoring.dependencies import solve from revscoring.datasources.meta import vectorizers from revscoring.features.meta import aggregators from revscoring.languages import english from revscoring.languages.english_vectors import google_news_kvs from revscoring.datasources import revision_oriented logging.basicConfig( level=logging.DEBUG, format='%(asctime)s %(levelname)s:%(name)s -- %(message)s' ) revision_text_vectors = vectorizers.word2vec( english.stopwords.revision.datasources.non_stopwords, google_news_kvs, name="revision.text.google_news_vectors") w2v = aggregators.mean( revision_text_vectors, vector=True, name="revision.text.google_news_vector_mean" ) observations = [0]*93000 def extract(w2v, obs): return 0 extractor_pool = Pool(processes=8) partial_ext = functools.partial(extract, w2v) for ob in extractor_pool.imap(partial_ext, observations): continue
english_vectors.py
from revscoring.datasources.meta import vectorizers google_news_kvs = \ vectorizers.word2vec.load_kv(path="~/ai/GoogleNews-vectors-negative300.bin", limit=150000)
Time taken: had to kill after several minutes.
mprof --include-children:
CMDLINE python buggy.py MEM 0.691406 1520664298.7699 MEM 20.257812 1520664298.8795 MEM 30.968750 1520664298.9889 MEM 32.023438 1520664299.0955 MEM 45.605469 1520664299.2148 MEM 53.628906 1520664299.3247 MEM 56.203125 1520664299.4357 MEM 66.789062 1520664299.5455 MEM 72.941406 1520664299.6550 MEM 81.031250 1520664299.7642 MEM 94.562500 1520664299.8761 MEM 98.843750 1520664299.9880 MEM 113.078125 1520664300.0975 MEM 127.144531 1520664300.2077 MEM 141.558594 1520664300.3175 MEM 155.289062 1520664300.4290 MEM 162.789062 1520664300.5408 MEM 176.656250 1520664300.6504 MEM 188.574219 1520664300.7610 MEM 202.207031 1520664300.8709 MEM 215.433594 1520664300.9779 MEM 227.468750 1520664301.0935 MEM 239.464844 1520664301.2127 MEM 245.457031 1520664301.3217 MEM 257.070312 1520664301.4297 MEM 268.863281 1520664301.5375 MEM 278.773438 1520664301.6472 MEM 289.964844 1520664301.7644 MEM 302.089844 1520664301.8724 MEM 313.738281 1520664301.9812 MEM 318.425781 1520664302.0925 MEM 325.191406 1520664302.2028 MEM 337.183594 1520664302.3125 MEM 672.968750 1520664302.4197 MEM 2987.976562 1520664302.5339 MEM 3107.085938 1520664302.6463 MEM 3225.644531 1520664302.7572 ... MEM 7812.152344 1520664349.4247 MEM 7813.441406 1520664349.5319 MEM 7814.472656 1520664349.6468 MEM 7644.578125 1520664349.7577 MEM 7763.824219 1520664349.8672 MEM 7510.433594 1520664349.9777 MEM 7616.394531 1520664350.0890 MEM 7427.886719 1520664350.1973 MEM 7500.371094 1520664350.3056 MEM 7743.484375 1520664350.4149 MEM 7806.648438 1520664350.5271
Caps at 8GB, more than double of previous case.
wow! So strange. Nice work on the test. Let's figure out a clean way to use a global. :)
See also T182350: Profile ORES code memory use, optimize pre-fork allocation, I think you're onto something important and we can learn from this, and use it to fix other places where we fail to use copy-on-write. @Sumit please link to the code changes you're making that seem to improve memory sharing.
Refer to the gist in the first comment for the code changes that make it multiprocessing friendly.
I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory
TL;DR: it didn't work. I get the exact same output for both strategies!
@Halfak you aren't quite replicating the setup. import_processor is still using the global demo_kv since individual scripts are referring to the same gloabal demo_kv.
I've added changes in https://github.com/halfak/demo_shared_memory/pull/1 that make import_processor use demo_kv as an argument similar to our setup of passing vectors in features objects as self.kv=vectors.
You'll see the magic ;)
Final resolution done by using a wrapper function - https://github.com/wiki-ai/revscoring/pull/394