Page MenuHomePhabricator

OpenStack API response time gets slower over time
Closed, ResolvedPublic

Description

Since we have started measuring the API response times, they seem to follow a pattern of linear or exponential increase. Restarting the service brings the response time back to the initial value.

Some related links:

Screenshot 2023-08-28 at 16.28.13.png (1×2 px, 580 KB)

Event Timeline

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:01:47Z] <wm-bot2> Restarting openstack services on cloudvirt1025: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:01:51Z] <wm-bot2> Restarting openstack services on cloudvirt1029: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:01:56Z] <wm-bot2> Restarting openstack services on cloudvirt1026: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:00Z] <wm-bot2> Restarting openstack services on cloudvirt1027: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:04Z] <wm-bot2> Restarting openstack services on cloudvirt1030: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:08Z] <wm-bot2> Restarting openstack services on cloudvirt1028: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:12Z] <wm-bot2> Restarting openstack services on cloudcontrol1005: ['nova-conductor', 'nova-scheduler', 'nova-api', 'nova-api-metadata', 'cinder-volume', 'cinder-scheduler', 'neutron-api', 'neutron-rpc-server', 'trove-api', 'trove-conductor', 'trove-taskmanager', 'keystone', 'keystone-admin', 'glance-api', 'magnum-api', 'magnum-conductor', 'heat-api', 'heat-api-cfn', 'heat-engine'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:26Z] <wm-bot2> Restarting openstack services on cloudvirt1032: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:30Z] <wm-bot2> Restarting openstack services on cloudvirt1031: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:35Z] <wm-bot2> Restarting openstack services on cloudvirt1033: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:39Z] <wm-bot2> Restarting openstack services on cloudvirt1035: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:43Z] <wm-bot2> Restarting openstack services on cloudvirt1037: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:47Z] <wm-bot2> Restarting openstack services on cloudvirt1039: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:51Z] <wm-bot2> Restarting openstack services on cloudvirt1034: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:54Z] <wm-bot2> Restarting openstack services on cloudvirt1036: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:02:58Z] <wm-bot2> Restarting openstack services on cloudvirt1040: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:03Z] <wm-bot2> Restarting openstack services on cloudvirt1045: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:07Z] <wm-bot2> Restarting openstack services on cloudvirt1043: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:11Z] <wm-bot2> Restarting openstack services on cloudvirt1046: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:15Z] <wm-bot2> Restarting openstack services on cloudvirt1041: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:18Z] <wm-bot2> Restarting openstack services on cloudvirt1044: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:23Z] <wm-bot2> Restarting openstack services on cloudvirt1042: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:27Z] <wm-bot2> Restarting openstack services on cloudvirt1038: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:31Z] <wm-bot2> Restarting openstack services on cloudvirt1047: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:35Z] <wm-bot2> Restarting openstack services on cloudvirt-wdqs1001: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:40Z] <wm-bot2> Restarting openstack services on cloudvirt-wdqs1002: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:44Z] <wm-bot2> Restarting openstack services on cloudvirt-wdqs1003: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:03:48Z] <wm-bot2> Restarting openstack services on cloudcontrol1006: ['nova-conductor', 'nova-scheduler', 'nova-api', 'nova-api-metadata', 'cinder-volume', 'cinder-scheduler', 'neutron-api', 'neutron-rpc-server', 'trove-api', 'trove-conductor', 'trove-taskmanager', 'keystone', 'keystone-admin', 'glance-api', 'magnum-api', 'magnum-conductor', 'heat-api', 'heat-api-cfn', 'heat-engine'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:12Z] <wm-bot2> Restarting openstack services on cloudcontrol1007: ['nova-conductor', 'nova-scheduler', 'nova-api', 'nova-api-metadata', 'cinder-volume', 'cinder-scheduler', 'neutron-api', 'neutron-rpc-server', 'trove-api', 'trove-conductor', 'trove-taskmanager', 'keystone', 'keystone-admin', 'glance-api', 'magnum-api', 'magnum-conductor', 'heat-api', 'heat-api-cfn', 'heat-engine'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:26Z] <wm-bot2> Restarting openstack services on cloudvirt1048: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:30Z] <wm-bot2> Restarting openstack services on cloudvirt1052: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:34Z] <wm-bot2> Restarting openstack services on cloudvirt1053: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:38Z] <wm-bot2> Restarting openstack services on cloudvirt1049: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:42Z] <wm-bot2> Restarting openstack services on cloudvirt1050: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:47Z] <wm-bot2> Restarting openstack services on cloudvirt1051: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:51Z] <wm-bot2> Restarting openstack services on cloudvirt1056: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:55Z] <wm-bot2> Restarting openstack services on cloudvirt1057: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:04:58Z] <wm-bot2> Restarting openstack services on cloudvirt1061: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:03Z] <wm-bot2> Restarting openstack services on cloudvirt1059: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:06Z] <wm-bot2> Restarting openstack services on cloudvirt1058: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:10Z] <wm-bot2> Restarting openstack services on cloudvirt1060: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:13Z] <wm-bot2> Restarting openstack services on cloudvirt1055: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:18Z] <wm-bot2> Restarting openstack services on cloudvirt1054: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:22Z] <wm-bot2> Restarting openstack services on cloudvirtlocal1001: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:25Z] <wm-bot2> Restarting openstack services on cloudvirtlocal1002: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:30Z] <wm-bot2> Restarting openstack services on cloudvirtlocal1003: ['nova-compute', 'neutron-linuxbridge-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:34Z] <wm-bot2> Restarting openstack services on cloudbackup2002: ['cinder-backup'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:37Z] <wm-bot2> Restarting openstack services on cloudbackup2001: ['cinder-backup'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:44Z] <wm-bot2> Restarting openstack services on cloudnet1005: ['neutron-linuxbridge-agent', 'neutron-dhcp-agent', 'neutron-metadata-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:47Z] <wm-bot2> Restarting openstack services on cloudnet1006: ['neutron-linuxbridge-agent', 'neutron-metadata-agent', 'neutron-dhcp-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:50Z] <wm-bot2> Restarting openstack services on cloudservices1004: ['designate-worker', 'designate-api', 'designate-mdns', 'designate-producer', 'designate-central', 'designate-sink', 'designate-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-08-28T15:05:53Z] <wm-bot2> Restarting openstack services on cloudservices1005: ['designate-producer', 'designate-sink', 'designate-worker', 'designate-central', 'designate-mdns', 'designate-agent'] (T345084) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-12-08T09:31:32Z] <wm-bot2> dcaro@urcuchillay START - Cookbook wmcs.openstack.restart_openstack (T345084)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-12-08T09:32:03Z] <wm-bot2> dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) (T345084)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-12-08T09:32:07Z] <wm-bot2> dcaro@urcuchillay START - Cookbook wmcs.openstack.restart_openstack (T345084)

Mentioned in SAL (#wikimedia-cloud) [2023-12-08T09:32:12Z] <dcaro> restarting nova and keystone as they are getting too slow (T345084)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-12-08T09:38:39Z] <wm-bot2> dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) (T345084)

I modified the limits a bit (trove -> 5s, nova-api -> 3s, others -> 1.5s), updated the dashboard too, they still fail from time to time,

There's an issue with trove related to requests, as from cloudlb1002 we don't get almost any, the 12h mean gets very skewed.

Will have to think about how to fix it (maybe forcing to be a minimum of requests or similar)

The alert triggered again yesterday, this time it was caused by a spike in response time for nova-api_backend, that has already ended without any intervention (as far as I know). Attaching a Prometheus graph showing only the affected metrics.

Screenshot 2024-03-08 at 15.35.52.png (1×2 px, 615 KB)

dcaro claimed this task.

I think we can close this for now, we tweaked a few of the api response times, and currently it has not triggered in a while.

Will open a new one if we start seeing alerts again.