Page MenuHomePhabricator

CloudVPS: horizon giving http/500 intermitently
Open, HighPublic

Description

I detected today a couple of http/500 errors when working with horizon. The error seems transient.

I don't have more information right now, but opening task so I don't forget to check logs and investigate more.

Related Objects

StatusSubtypeAssignedTask
InvalidNone
OpenNone

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 16 2019, 12:14 PM
aborrero renamed this task from CloudVPS: horizon giving http/500 from intermitently to CloudVPS: horizon giving http/500 intermitently.Dec 16 2019, 12:14 PM
aborrero triaged this task as High priority.
aborrero moved this task from Inbox to Important on the cloud-services-team (Kanban) board.

possibly related, in the horizon error log:

2019-12-16 12:38:00.187475 Traceback (most recent call last):
2019-12-16 12:38:00.187527   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/openstack_dashboard/wsgi/django.wsgi", line 14, in <module>
2019-12-16 12:38:00.187536     application = get_wsgi_application()
2019-12-16 12:38:00.187548   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/core/wsgi.py", line 14, in get_wsgi_application
2019-12-16 12:38:00.187554     django.setup()
2019-12-16 12:38:00.187565   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/__init__.py", line 18, in setup
2019-12-16 12:38:00.187571     apps.populate(settings.INSTALLED_APPS)
2019-12-16 12:38:00.187581   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/apps/registry.py", line 78, in populate
2019-12-16 12:38:00.187587     raise RuntimeError("populate() isn't reentrant")
2019-12-16 12:38:00.187611 RuntimeError: populate() isn't reentrant

I reloaded Apache on both labweb boxes; we'll see if that helps.

aborrero closed this task as Resolved.Dec 18 2019, 4:31 PM
aborrero claimed this task.

I've been working with horizon for a couple of days and never see this again. Closing task now.

elukey reopened this task as Open.Jan 13 2020, 12:40 PM
elukey added a subscriber: elukey.

Happened to me now while trying to create a VM in the analytics project. I see intermittent 500s with the generic error msg "The server encountered an internal error or misconfiguration and was unable to complete your request", and errors in populating dropbox lists due to various errors when creating a VM.

aborrero removed aborrero as the assignee of this task.Jan 14 2020, 5:09 PM

@Andrew and I are going to pair up on this in case that helps at all soon

I see many errors like this today in both labweb servers:

2020-01-21 10:21:58.000296 mod_wsgi (pid=15346): Target WSGI script '/srv/deployment/horizon/venv/lib/python3.5/site-packages/openstack_dashboard/wsgi.py' cannot be loaded as Python module.
2020-01-21 10:21:58.000484 mod_wsgi (pid=15346): Exception occurred processing WSGI script '/srv/deployment/horizon/venv/lib/python3.5/site-packages/openstack_dashboard/wsgi.py'.
2020-01-21 10:21:58.000677 Traceback (most recent call last):
2020-01-21 10:21:58.000741   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/openstack_dashboard/wsgi.py", line 29, in <module>
2020-01-21 10:21:58.000754     application = get_wsgi_application()
2020-01-21 10:21:58.000768   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/core/wsgi.py", line 12, in get_wsgi_application
2020-01-21 10:21:58.000777     django.setup(set_prefix=False)
2020-01-21 10:21:58.000790   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/__init__.py", line 24, in setup
2020-01-21 10:21:58.000798     apps.populate(settings.INSTALLED_APPS)
2020-01-21 10:21:58.000811   File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/django/apps/registry.py", line 81, in populate
2020-01-21 10:21:58.000819     raise RuntimeError("populate() isn't reentrant")
2020-01-21 10:21:58.000845 RuntimeError: populate() isn't reentrant

Mentioned in SAL (#wikimedia-cloud) [2020-01-21T10:24:02Z] <arturo> running sudo systemctl restart apache2.service in both labweb servers to try mitigating T240852

On apache restart, I see many of:

aborrero@labweb1001:~ 12s $ sudo tail -f /var/log/apache2/horizon_error.log
  File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/horizon/utils/memoized.py", line 90, in remove
NameError: name 'KeyError' is not defined
Exception ignored in: <function memoized.<locals>.decorate.<locals>.wrapped.<locals>.remove at 0x7f1b0423a620>
Traceback (most recent call last):
  File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/horizon/utils/memoized.py", line 90, in remove
NameError: name 'KeyError' is not defined
Exception ignored in: <function memoized.<locals>.decorate.<locals>.wrapped.<locals>.remove at 0x7f1b0423aae8>
Traceback (most recent call last):
  File "/srv/deployment/horizon/venv/lib/python3.5/site-packages/horizon/utils/memoized.py", line 90, in remove
NameError: name 'KeyError' is not defined