Since we're restarting them often now, we need to make sure failover + failure handling works properly.
Description
Details
Related Objects
- Mentioned In
- rOPUP9c975c2ac6fa: labs: Set timeout for ldap3 using scripts
rOPUP8da8e973b55e: labs: Set timeout for ldap3 using scripts
rOPUP58f9692bf689: tools: Use LDAP servers in HA manner for maintain-kubeusers
rOPUPf0165d4fe568: tools: Use LDAP servers in HA manner for maintain-kubeusers
rOPUPa8fcc4150e8f: tools: Use LDAP servers in HA manner for maintain-kubeusers
rOPUPf5da83fbac28: tools: Use LDAP servers in HA manner for maintain-kubeusers
rOPUPce745524d6fb: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUP17a6aae6f7bd: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUP98ea146cbd00: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUP9881605b05d6: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUP507c920faf1b: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUPb2e517de8c5e: labstore: Configure LDAP failover + timeout for create-dbuser
rOPUP825397be2ed9: labstore: Configure LDAP failover + timeout for create-dbuser
Event Timeline
Change 303565 had a related patch set uploaded (by Yuvipanda):
labstore: Configure LDAP failover timeout for create-dbuser
Change 303565 merged by Yuvipanda:
labstore: Configure LDAP failover timeout for create-dbuser
Change 303607 had a related patch set uploaded (by Yuvipanda):
tools: Use LDAP servers in HA manner for maintain-kubeusers
Change 303607 merged by Yuvipanda:
tools: Use LDAP servers in HA manner for maintain-kubeusers
maintain-kubeusers was stuck connecting to seaborgium today for minutes, despite there being a 1s connection timeout :(
maintain- 3414 root 4u IPv4 10668954 0t0 TCP tools-k8s-master-01.tools.eqiad.wmflabs:33368->seaborgium.wikimedia.org:ldap (CLOSE_WAIT) maintain- 3414 root 5u IPv4 10670197 0t0 TCP tools-k8s-master-01.tools.eqiad.wmflabs:33432->seaborgium.wikimedia.org:ldap (CLOSE_WAIT) maintain- 3414 root 7u IPv4 10668828 0t0 TCP tools-k8s-master-01.tools.eqiad.wmflabs:33302->seaborgium.wikimedia.org:ldap (CLOSE_WAIT)
Adding a time_limit=N to conn.search in get_tools_from_ldap might help. The socket was connected and in CLOSE_WAIT according to what @yuvipanda said on irc, so the 1s connect limit wouldn't apply.
time_limit: number of seconds allowed for the search (defaults to None). > If None the search can take an unlimited amount of time, unless the server has a more restrictive rule. -- http://ldap3.readthedocs.io/searches.html
Change 305616 had a related patch set uploaded (by Yuvipanda):
labs: Set timeout for ldap3 using scripts