For testing of changes to the poolcounter code, we need a poolcounter instance in deployment-prep
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T53494 Use Beta cluster as a true canary for code deployments (epic) | |||
Open | None | T87220 Minimize infrastructure differences between Beta Cluster and production | |||
Resolved | • AlexMonk-WMF | T38891 Setup poolcounter daemon in Beta Cluster | |||
Resolved | Joe | T105378 Stop a poolcounter server fail from being a SPOF for the service and the api (and the site) | |||
Resolved | Joe | T112501 Create a poolcounter instance in deployment-prep | |||
Resolved | Andrew | T112200 Update remaining virt nodes to kilo |
Event Timeline
Cannot make a new instance communicate with the deployment-prep puppetmaster. @Andrew any help would be appreciated.
root@deployment-poolcounter01:/var/lib/puppet# ping deployment-puppetmaster PING deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63) 56(84) bytes of data. 64 bytes from deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63): icmp_req=1 ttl=64 time=0.353 ms 64 bytes from deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63): icmp_req=2 ttl=64 time=0.369 ms 64 bytes from deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63): icmp_req=3 ttl=64 time=1.68 ms 64 bytes from deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63): icmp_req=4 ttl=64 time=0.516 ms 64 bytes from deployment-puppetmaster.deployment-prep.eqiad.wmflabs (10.68.16.63): icmp_req=5 ttl=64 time=0.309 ms ^C --- deployment-puppetmaster.deployment-prep.eqiad.wmflabs ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 3999ms rtt min/avg/max/mdev = 0.309/0.646/1.686/0.525 ms root@deployment-poolcounter01:/var/lib/puppet# telnet deployment-puppetmaster 8140 Trying 10.68.16.63... telnet: Unable to connect to remote host: Connection timed out
I see this problem and can reproduce it on another instance. No idea as to the cause yet.
This appears to be yet another issue with the nova rolling-upgrade process.
The new instance, deployment-puppetmaster, was running on labvirt1004, one of the nodes I upgraded to Kilo. The puppetmaster was on labvirt1007 which was still running Juno. I just now upgraded labvirt1007 to Kilo and the telnet command started to work.
The network controller is also running Kilo.
So, presumably something with the handshake between nova-network Kilo and nova-compute Juno is buggy. I'll upgrade the remaining virt nodes shortly, and then this issue should stop appearing.
Signed and puppet successfully ran on deployment-poolcounter01.deployment-prep.eqiad.wmflabs