Page MenuHomePhabricator

tools puppetmaster is badly overloaded
Closed, ResolvedPublic

Description

Puppet runs on tools nodes are taking much longer than they should. I think this is a result of growth in tools (adding the Stretch grid) with no accompanying growth in puppetmaster size. tools-puppetmaster-01 is an m1.medium, only 2 CPUs. At any given time it seems to be handling about 10 concurrent puppet runs.

We can ignore this and assume it will go away when we pare down the Trusty grid, or we can build a new bigger puppetmaster, or we can figure out how to properly scale this and have multiple puppetmasters. (I'm not especially in favor of the last option since it will require keeping local hacks in sync)

Event Timeline

I think building a bigger instance would solve the issue in the mid term., i.e, having 4 to 8 VCPU instead of 2.

Note that shinken should be restarted once this is resolved

Resized tools-puppetmaster-01 to m1.large.

root@tools-puppetmaster-01:~# free -m
             total       used       free     shared    buffers     cached
Mem:          7987        752       7234          8         32        136
-/+ buffers/cache:        583       7404
Swap:          510          0        510

root@tools-puppetmaster-01:~# grep ^processor /proc/cpuinfo 
processor	: 0
processor	: 1
processor	: 2
processor	: 3

There was a small different from the documented resize procedure, because nova rescheduled the VM on a different labvirt (labvirt1005) when the VM was originally running on labvirt1003. The troubleshooting steps to fix this error are detailed below:

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server list | grep puppetmaster
| 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d | tools-puppetmaster-01       | ACTIVE  | public=10.68.20.74                  |


root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1003                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1003.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 1                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | active                                                                           |
| OS-SRV-USG:launched_at               | 2017-06-27T21:11:52.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.medium (3)                                                                    |
| hostId                               | ca0907d63e4d5fb8483d7e946662d2c800fc24d0599d03940163175e                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| progress                             | 0                                                                                |
| project_id                           | tools                                                                            |
| properties                           |                                                                                  |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | ACTIVE                                                                           |
| updated                              | 2018-11-15T23:10:07Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+


root@tools-puppetmaster-01:~# shutdown -h now

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1003                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1003.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 4                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | stopped                                                                          |
| OS-SRV-USG:launched_at               | 2017-06-27T21:11:52.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.medium (3)                                                                    |
| hostId                               | ca0907d63e4d5fb8483d7e946662d2c800fc24d0599d03940163175e                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| project_id                           | tools                                                                            |
| properties                           |                                                                                  |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | SHUTOFF                                                                          |
| updated                              | 2019-02-22T15:13:58Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+

root@labvirt1003:/var/lib/nova/instances# cp -Rp 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d.20190222


root@labvirt1003:/var/lib/nova/instances# md5sum 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d/disk 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d.20190222/disk
cb4ebdee76cee4abf1023873aaaf68c3  1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d/disk
cb4ebdee76cee4abf1023873aaaf68c3  1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d.20190222/disk

root@labvirt1003:/var/lib/nova/instances# virsh dumpxml i-00090b57 > /root/tools-puppetmaster-01.xml

root@cloudcontrol1003:~# openstack image set --activate 42545d14-bc05-4586-bd4e-07239bf00b72

root@cloudcontrol1003:~# OS_PROJECT_ID=tools nova --os-region-name=eqiad resize tools-puppetmaster-01 4 --poll

Server resizing... 100% complete
Finished

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1005 <<<<<<< VM got scheduled in a different virt                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1005.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 4                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | resized                                                                          |
| OS-SRV-USG:launched_at               | 2019-02-22T15:27:18.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.large (4)                                                                     |
| hostId                               | d0540caeb602d5a89bdfb2a008fee2a30603f08cd2c17fb5abcaa29d                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| progress                             | 0                                                                                |
| project_id                           | tools                                                                            |
| properties                           |                                                                                  |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | VERIFY_RESIZE                                                                    |
| updated                              | 2019-02-22T15:27:19Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server stop tools-puppetmaster-01
Cannot 'stop' instance 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d while it is in vm_state resized (HTTP 409) (Request-ID: req-72a78b45-fc85-47de-8a8c-8cd2d8a69219)

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server resize --confirm tools-puppetmaster-01

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1005                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1005.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 4                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | resized                                                                          |
| OS-SRV-USG:launched_at               | 2019-02-22T15:27:18.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.large (4)                                                                     |
| hostId                               | d0540caeb602d5a89bdfb2a008fee2a30603f08cd2c17fb5abcaa29d                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| progress                             | 0                                                                                |
| project_id                           | tools                                                                            |
| properties                           |                                                                                  |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | VERIFY_RESIZE                                                                    |
| updated                              | 2019-02-22T15:31:04Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+

mysql:nova@m5-master.eqiad.wmnet [nova_eqiad1]> use nova
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql:nova@m5-master.eqiad.wmnet [nova]> select * from instances where uuid="1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d";
+---------------------+---------------------+------------+--------+-------------+---------+------------+--------------------------------------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+-----------------------+-------------+-----------+----------------+--------------+---------------------+---------------+-----------------------+-----------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+-------------------------+---------+-----------+---------+--------------------+
| created_at          | updated_at          | deleted_at | id     | internal_id | user_id | project_id | image_ref                            | kernel_id | ramdisk_id | launch_index | key_name | key_data | power_state | vm_state | memory_mb | vcpus | hostname              | host        | user_data | reservation_id | scheduled_at | launched_at         | terminated_at | display_name          | display_description   | availability_zone | locked | os_type | launched_on | instance_type_id | vm_mode | uuid                                 | architecture | root_device_name | access_ip_v4 | access_ip_v6 | config_drive | task_state | default_ephemeral_device | default_swap_device | progress | auto_disk_config | shutdown_terminate | disable_terminate | root_gb | ephemeral_gb | cell_name | node                    | deleted | locked_by | cleaned | ephemeral_key_uuid |
+---------------------+---------------------+------------+--------+-------------+---------+------------+--------------------------------------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+-----------------------+-------------+-----------+----------------+--------------+---------------------+---------------+-----------------------+-----------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+-------------------------+---------+-----------+---------+--------------------+
| 2017-06-27 21:11:15 | 2019-02-22 15:44:43 | NULL       | 592727 |        NULL | andrew  | tools      | 42545d14-bc05-4586-bd4e-07239bf00b72 |           |            |            0 | NULL     | NULL     |           4 | stopped  |      8192 |     4 | tools-puppetmaster-01 | labvirt1005 | NULL      | r-6nmjp7l6     | NULL         | 2019-02-22 15:27:18 | NULL          | tools-puppetmaster-01 | tools-puppetmaster-01 | nova              |      0 | NULL    | labvirt1010 |                3 | NULL    | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d | NULL         | /dev/vda         | NULL         | NULL         |              | NULL       | NULL                     | NULL                |        0 |                1 |                  0 |                 0 |      80 |            0 | NULL      | labvirt1005.eqiad.wmnet |       0 | NULL      |       0 | NULL               |
+---------------------+---------------------+------------+--------+-------------+---------+------------+--------------------------------------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+-----------------------+-------------+-----------+----------------+--------------+---------------------+---------------+-----------------------+-----------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+-------------------------+---------+-----------+---------+--------------------+
1 row in set (0.00 sec)

mysql:nova@m5-master.eqiad.wmnet [nova]> update instances set host="labvirt1003",node="labvirt1003.eqiad.wmnet" where uuid = "1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d";
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1003                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1003.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 4                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | stopped                                                                          |
| OS-SRV-USG:launched_at               | 2019-02-22T15:27:18.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.large (4)                                                                     |
| hostId                               | ca0907d63e4d5fb8483d7e946662d2c800fc24d0599d03940163175e                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| project_id                           | tools                                                                            |
| properties                           | OS-EXT-SRV-ATTR:host='labvirt1003'                                               |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | SHUTOFF                                                                          |
| updated                              | 2019-02-22T15:44:43Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server start tools-puppetmaster-01

root@labvirt1003:~# virsh list --all | grep i-00090b57

root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# ls -la
total 12
drwxr-xr-x  2 nova nova   52 Feb 22 15:50 .
drwxr-xr-x 33 nova nova 4096 Feb 22 15:31 ..
-rw-r--r--  1 nova nova   79 Feb 22 15:31 disk.info
-rw-r--r--  1 nova nova 2696 Feb 22 15:50 libvirt.xml

root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# cp -Rp ../1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d.20190222/* .
root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# ls -la
total 19096956
drwxr-xr-x  2 nova nova          89 Feb 22 15:52 .
drwxr-xr-x 33 nova nova        4096 Feb 22 15:31 ..
-rw-------  1 root root       37724 Feb 22 15:13 console.log
-rw-r--r--  1 root root 19555352576 Feb 22 15:13 disk
-rw-r--r--  1 nova nova          79 Jun 27  2017 disk.info
-rw-r--r--  1 nova nova        2697 Nov 15 23:09 libvirt.xml


root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# virsh define --file ./libvirt.xml 
Domain i-00090b57 defined from ./libvirt.xml

root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# virsh list --all | grep i-00090b57
 -     i-00090b57                     shut off

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server start tools-puppetmaster-01

root@labvirt1003:/var/lib/nova/instances/1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d# virsh list --all | grep i-00090b57
 4190  i-00090b57                     running

root@cloudcontrol1003:~# OS_PROJECT_ID=tools openstack --os-region=eqiad server show tools-puppetmaster-01
+--------------------------------------+----------------------------------------------------------------------------------+
| Field                                | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | labvirt1003                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | labvirt1003.eqiad.wmnet                                                          |
| OS-EXT-SRV-ATTR:instance_name        | i-00090b57                                                                       |
| OS-EXT-STS:power_state               | 1                                                                                |
| OS-EXT-STS:task_state                | None                                                                             |
| OS-EXT-STS:vm_state                  | active                                                                           |
| OS-SRV-USG:launched_at               | 2019-02-22T15:27:18.000000                                                       |
| OS-SRV-USG:terminated_at             | None                                                                             |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| addresses                            | public=10.68.20.74                                                               |
| config_drive                         |                                                                                  |
| created                              | 2017-06-27T21:11:15Z                                                             |
| flavor                               | m1.large (4)                                                                     |
| hostId                               | ca0907d63e4d5fb8483d7e946662d2c800fc24d0599d03940163175e                         |
| id                                   | 1d88b6f9-4ed7-4abf-a3ba-5cb17751bb4d                                             |
| image                                | debian-8.7-jessie (deprecated 2017-07-19) (42545d14-bc05-4586-bd4e-07239bf00b72) |
| key_name                             | None                                                                             |
| name                                 | tools-puppetmaster-01                                                            |
| os-extended-volumes:volumes_attached | []                                                                               |
| progress                             | 0                                                                                |
| project_id                           | tools                                                                            |
| properties                           | OS-EXT-SRV-ATTR:host='labvirt1003'                                               |
| security_groups                      | [{u'name': u'default'}, {u'name': u'puppetmaster'}]                              |
| status                               | ACTIVE                                                                           |
| updated                              | 2019-02-22T15:55:52Z                                                             |
| user_id                              | andrew                                                                           |
+--------------------------------------+----------------------------------------------------------------------------------+
Andrew claimed this task.

This seems better! We still need to rebuild the puppetmaster eventually, but the overload issue is resolved.