Page MenuHomePhabricator

ms-fe2010, ms-fe2011, ms-fe2012 had its swift-proxy.service failed
Closed, DuplicatePublicBUG REPORT

Description

Initial alert:

<icinga-wm> PROBLEM - Check systemd state on ms-fe2010 is CRITICAL: CRITICAL - degraded: The following units failed: swift-proxy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state

Apparently, it was recently restarted, and on startup was failing with a python exception due to:

Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: Traceback (most recent call last):
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/bin/swift-proxy-server", line 23, in <module>
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     sys.exit(run_wsgi(conf_file, 'proxy-server', **options))
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 908, in run_wsgi
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     loadapp(conf_path, global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 389, in loadapp
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     ctx = loadcontext(loadwsgi.APP, conf_file, global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 373, in loadcontext
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 296, in loadconte
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 320, in _loadconf
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     return loader.get_context(object_type, name, global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 66, in get_context
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     object_type, name=name, global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 450, in get_conte
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     global_additions=global_additions)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 562, in _pipeline
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     for name in pipeline[:-1]]
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/swift/common/wsgi.py", line 66, in get_context
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     object_type, name=name, global_conf=global_conf)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 458, in get_conte
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     section)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 517, in _context_
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     value = import_string(found_expr)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 22, in import_str
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     return pkg_resources.EntryPoint.parse("x=" + s).load(False)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2291, in load
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     return self.resolve()
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2297, in resolve
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     module = __import__(self.module_name, fromlist=['__name__'], level=0)
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:   File "/usr/local/lib/python2.7/dist-packages/wmf/rewrite.py", line 10, in <module>
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]:     import monotonic
Nov 25 06:43:37 ms-fe2010 swift-proxy-server[23071]: ImportError: No module named monotonic
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Main process exited, code=exited, status=1/FAILURE
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Unit entered failed state.
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Failed with result 'exit-code'.
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Service hold-off time over, scheduling restart.
Nov 25 06:43:37 ms-fe2010 systemd[1]: Stopped OpenStack Swift proxy server.
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Start request repeated too quickly.
Nov 25 06:43:37 ms-fe2010 systemd[1]: Failed to start OpenStack Swift proxy server.
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Unit entered failed state.
Nov 25 06:43:37 ms-fe2010 systemd[1]: swift-proxy.service: Failed with result 'exit-code'.

I did "apt install python-monotonic" (the version for python 2, as it was the one the process was complaining about, its version for python3 was already installed) and then systemctl restart swift-proxy (as the process was aborted due to too many failed restarts), but my worry is that if more proxy servers had restarted (and failed), it would have brought the service down?

After that was done:

<icinga-wm> RECOVERY - Check systemd state on ms-fe2010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
<icinga-wm> RECOVERY - Check systemd state on ms-fe2011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
<icinga-wm> RECOVERY - Check systemd state on ms-fe2012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state

In any case, please make sure that is the right fix, and understanding why if failed (missing dependency on package or on puppet?) and if something else triggered it so to avoid more issues. Also if it is missing on other proxies?