noticed this when first provisioning a new machine, salt-call would try to talk to tin on ipv6 port 6379 and fail, timeout, fallback to ipv4 and then succeed making the invocation unnecessarily slow. I don't see the salt master redis bound on ipv6 port, perhaps we could simply do that?
Description
Details
Related Objects
Event Timeline
more context
root@restbase2001:~# ps fwaux | grep -i salt-call root 2317 1.3 0.0 334692 50056 ? Ssl 11:11 0:00 \_ /usr/bin/python /usr/bin/salt-call --log-level=quiet --out=json deploy.fetch cassandra/logstash-logback-encoder root 2473 0.0 0.0 12720 2188 pts/1 S+ 11:12 0:00 \_ grep -i salt-call root@restbase2001:~# strace -f -p 2317 Process 2317 attached with 3 threads [pid 2317] connect(12, {sa_family=AF_INET6, sin6_port=htons(6379), inet_pton(AF_INET6, "2620:0:861:101:10:64:0:196", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28 <unfinished ...> [pid 2383] epoll_wait(9, <unfinished ...> [pid 2382] epoll_wait(7,
file descriptors
salt-call 2317 root mem REG 9,0 140928 915730 /lib/x86_64-linux-gnu/ld-2.19.so salt-call 2317 root 0r CHR 1,3 0t0 1028 /dev/null salt-call 2317 root 1u REG 9,0 0 392464 /tmp/puppet20151111-1073-1hqxoc1 salt-call 2317 root 2u REG 9,0 0 392464 /tmp/puppet20151111-1073-1hqxoc1 salt-call 2317 root 3w REG 253,0 3300 62128166 /var/log/salt/minion salt-call 2317 root 4u 0000 0,10 0 7967 anon_inode salt-call 2317 root 5r CHR 1,9 0t0 1033 /dev/urandom salt-call 2317 root 6u 0000 0,10 0 7967 anon_inode salt-call 2317 root 7u 0000 0,10 0 7967 anon_inode salt-call 2317 root 8u 0000 0,10 0 7967 anon_inode salt-call 2317 root 9u 0000 0,10 0 7967 anon_inode salt-call 2317 root 10u 0000 0,10 0 7967 anon_inode salt-call 2317 root 11u IPv4 67911 0t0 TCP restbase2001.codfw.wmnet:38302->palladium.eqiad.wmnet:4506 (ESTABLISHED) salt-call 2317 root 12u IPv6 69099 0t0 TCP [2620:0:860:102:3ea8:2aff:fe0a:eca0]:55278->tin.eqiad.wmnet:6379 (SYN_SENT)
looking at bit more into this, redis on tin is 2:2.6.13-1+wmf1 though ipv6 support landed in 2.8 as per https://github.com/antirez/redis/pull/61
proposed ad-hoc fix in trebuchet instead, https://github.com/trebuchet-deploy/trebuchet/pull/17
Change 254128 had a related patch set uploaded (by Filippo Giunchedi):
deployment: add redis socket_connect_timeout
Change 254128 merged by Filippo Giunchedi:
deployment: add redis socket_connect_timeout
Change 255090 had a related patch set uploaded (by Filippo Giunchedi):
deployment: set socket_connect_timeout to 2s
Change 255090 merged by Filippo Giunchedi:
deployment: set socket_connect_timeout to 2s
Change 255092 had a related patch set uploaded (by Filippo Giunchedi):
deployment: fix pyredis timeout argument and timeout to 5s
Change 255092 merged by Filippo Giunchedi:
deployment: fix pyredis timeout argument and timeout to 5s
"fixed" as in the socket_connect_timeout option wasn't introduced until pyredis 2.10 (that means jessie) so we are passing socket_timeout to set a timeout on the socket as a whole (not just connect). Eventually when salt masters are upgraded to jessie (if ever) we can move again to just the connect timeout, thus "stalled" so we don't forget one way or another
Change 256403 had a related patch set uploaded (by Filippo Giunchedi):
deployment: fix socket_connect_timeout argument
Change 256403 merged by Filippo Giunchedi:
deployment: fix socket_connect_timeout argument