Looks like celery has shut down on all of the workers. I'm looking into it now.
I think we might be too close to the memory ceiling and an OOM is what's killing them.
That said, when I restart celery, the lowest available memory gets is ~5GB (out of 8GB) so it doesn't look like we're *really* running out of memory. Could there be another reason we see:
MemoryError: [Errno 12] Cannot allocate memory
Something really strange is going on. I cut our celery workers in half an we're still not able to actually start up celery because we get a MemoryError during the startup process. We haven't done a deployment here in a while. What could have changed?
Looks like the OOM error might have been old. Here's what I have now:
$ sudo -u www-data ../venv/bin/python ores_celery.py /srv/ores/venv/lib/python3.5/site-packages/smart_open/smart_open_lib.py:398: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function 'See the migration notes for details: %s' % _MIGRATION_NOTES_URL Hspell: can't open /usr/share/hspell/hebrew.wgz.sizes. Hspell: can't open /usr/share/hspell/hebrew.wgz.sizes. Traceback (most recent call last): File "ores_celery.py", line 6, in <module> application = celery.build() File "/srv/ores/config/ores/applications/celery.py", line 41, in build config, config['ores']['scoring_system']) File "/srv/ores/config/ores/scoring_systems/celery_queue.py", line 232, in from_config config, name, section_key=section_key) File "/srv/ores/config/ores/scoring_systems/scoring_system.py", line 308, in _kwargs_from_config config, section['metrics_collector']) File "/srv/ores/config/ores/metrics_collectors/metrics_collector.py", line 62, in from_config return Class.from_config(config, name) File "/srv/ores/config/ores/metrics_collectors/statsd.py", line 151, in from_config return cls.from_parameters(**kwargs) File "/srv/ores/config/ores/metrics_collectors/statsd.py", line 131, in from_parameters statsd_client = statsd.StatsClient(*args, **kwargs) File "/srv/ores/venv/lib/python3.5/site-packages/statsd/client.py", line 146, in __init__ host, port, fam, socket.SOCK_DGRAM) File "/usr/lib/python3.5/socket.py", line 733, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known
Looks like it is failing because statsd isn't there to connect to anymore.