I got curious on why provisioning a new VM takes so long, looking at the fullstackd graphs: https://grafana.wikimedia.org/goto/Jrb386iDR?orgId=1
Asking cloud-init first on a newly-created VM:
# cloud-init analyze blame | head -10
-- Boot Record 01 --
29.77100s (init-local/search-OpenStackLocal)
25.00000s (init-network/config-bootcmd)
01.75600s (modules-config/config-apt-configure)
00.67000s (init-network/config-ssh)
00.46500s (modules-final/config-package-update-upgrade-install)OpenStackLocal taking 30s was surprising to me since that is when metadata about the cloud is gathered.
Digging further, the metadata service takes ~1s to answer, and cloud-init does a bunch of calls, which explains the ~30s total:
$ curl -w '\nTotal: %{time_total}s\nDNS: %{time_namelookup}s\nConnect: %{time_connect}s\nStart Transfer: %{time_starttransfer}s\n' \ -o /dev/null -s http://169.254.169.254/ Total: 0.000000s DNS: 0.000000s Connect: 0.000000s Start Transfer: 0.000000s 1.0 2007-01-19 2007-03-01 2007-08-29 2007-10-10 2007-12-15 2008-02-01 2008-09-01 2009-04-04 latest Total: 1.026512s DNS: 0.000045s Connect: 0.001972s Start Transfer: 1.022772s
Digging a little further, neutron-metadata-agent is the process answering those calls and then talking to nova-api-metadata. The service is leaking file descriptors as per T395742: Neutron metadata service failing for all VMs and has kept doing so, becoming quite slow
root@cloudnet1006:~# lsof -p $(systemctl show -P MainPID neutron-metadata-agent) | grep -c eventpoll 193376
Even though after a restart of neutron-metadata-agent responses are still slow:
curl -w '\nTotal: %{time_total}s\nDNS: %{time_namelookup}s\nConne
ct: %{time_connect}s\nStart Transfer: %{time_starttransfer}s\n' \ -o /dev/null -s http://169.254.169.254/And communication with nova metadata doesn't seem slow, so it seems the issue is somewhere between neutron-metadata-agent and the VM
cloudnet1006:~# tshark -i ens3f0np0 -td 'port 8775'
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens3f0np0'
** (tshark:871501) 08:20:11.747456 [Main MESSAGE] -- Capture started.
** (tshark:871501) 08:20:11.747575 [Main MESSAGE] -- File: "/tmp/wireshark_ens3f0np0XRKWG3.pcapng"
1 0.000000000 2a02:ec80:a000:202::3 → 2a02:ec80:a000:4000::1 TCP 98 46584 → 8775 [SYN] Seq=0 Win=35760 Len=0 MSS=8940 SACK_PERM TSval=2106012432 TSecr=0 WS=512
2 0.000063029 2a02:ec80:a000:4000::1 → 2a02:ec80:a000:202::3 TCP 98 8775 → 46584 [SYN, ACK] Seq=0 Ack=1 Win=35712 Len=0 MSS=8940 SACK_PERM TSval=1319812654 TSecr=2106012432 WS=512
3 0.000036772 2a02:ec80:a000:202::3 → 2a02:ec80:a000:4000::1 TCP 90 46584 → 8775 [ACK] Seq=1 Ack=1 Win=35840 Len=0 TSval=2106012432 TSecr=1319812654
4 0.000162656 2a02:ec80:a000:202::3 → 2a02:ec80:a000:4000::1 HTTP 466 GET / HTTP/1.1
5 0.001770702 2a02:ec80:a000:4000::1 → 2a02:ec80:a000:202::3 TCP 267 HTTP/1.1 200 OK [TCP segment of a reassembled PDU]
6 0.000021594 2a02:ec80:a000:202::3 → 2a02:ec80:a000:4000::1 TCP 90 46584 → 8775 [ACK] Seq=377 Ack=178 Win=35840 Len=0 TSval=2106012434 TSecr=1319812656
7 0.000014612 2a02:ec80:a000:4000::1 → 2a02:ec80:a000:202::3 HTTP 90 HTTP/1.1 200 OK (text/plain)
8 0.001011807 2a02:ec80:a000:202::3 → 2a02:ec80:a000:4000::1 TCP 90 46584 → 8775 [FIN, ACK] Seq=377 Ack=179 Win=35840 Len=0 TSval=2106012435 TSecr=1319812656
9 0.000040456 2a02:ec80:a000:4000::1 → 2a02:ec80:a000:202::3 TCP 90 8775 → 46584 [ACK] Seq=179 Ack=378 Win=35840 Len=0 TSval=1319812657 TSecr=2106012435