Initial alert:
<icinga-wm> PROBLEM - Check systemd state on ms-fe2010 is CRITICAL: CRITICAL - degraded: The following units failed: swift-proxy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
Apparently, it was recently restarted, and on startup was failing with a python exception due to:
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: Traceback (most recent call last): Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/bin/swift-proxy-server", line 23, in <module> Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: sys.exit(run_wsgi(conf_file, 'proxy-server', **options)) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 908, in run_wsgi Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: loadapp(conf_path, global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 389, in loadapp Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: ctx = loadcontext(loadwsgi.APP, conf_file, global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 373, in loadcontext Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 296, in loadconte Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 320, in _loadconf Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: return loader.get_context(object_type, name, global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 66, in get_context Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: object_type, name=name, global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 450, in get_conte Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: global_additions=global_additions) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 562, in _pipeline Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: for name in pipeline[:-1]] Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 66, in get_context Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: object_type, name=name, global_conf=global_conf) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 458, in get_conte Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: section) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 517, in _context_ Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: value = import_string(found_expr) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 22, in import_str Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: return pkg_resources.EntryPoint.parse("x=" + s).load(False) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2291, in load Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: return self.resolve() Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2297, in resolve Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: module = __import__(self.module_name, fromlist=['__name__'], level=0) Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: File "/usr/local/lib/python2.7/dist-packages/wmf/rewrite.py", line 10, in <module> Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: import monotonic Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: ImportError: No module named monotonic Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Main process exited, code=exited, status=1/FAILURE Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Unit entered failed state. Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Failed with result 'exit-code'. Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Service hold-off time over, scheduling restart. Nov 25 06:43:37 ms-fe2010 systemd[1]: Stopped OpenStack Swift proxy server. Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Start request repeated too quickly. Nov 25 06:43:37 ms-fe2010 systemd[1]: Failed to start OpenStack Swift proxy server. Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Unit entered failed state. Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Failed with result 'exit-code'.
I did "apt install python-monotonic" (the version for python 2, as it was the one the process was complaining about, its version for python3 was already installed) and then systemctl restart swift-proxy (as the process was aborted due to too many failed restarts), but my worry is that if more proxy servers had restarted (and failed), it would have brought the service down?
After that was done:
<icinga-wm> RECOVERY - Check systemd state on ms-fe2010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state <icinga-wm> RECOVERY - Check systemd state on ms-fe2011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state <icinga-wm> RECOVERY - Check systemd state on ms-fe2012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
In any case, please make sure that is the right fix, and understanding why if failed (missing dependency on package or on puppet?) and if something else triggered it so to avoid more issues. Also if it is missing on other proxies?