Page MenuHomePhabricator

Python 3's eventlet.green getaddrinfo timeout in Bullseye
Closed, ResolvedPublic

Description

Found today while debugging Swift on Bullseye in Cloud VPS. It looks like getaddrinfo from eventlet times out (e.g. on ms-be-01.swift.eqiad1.wikimedia.cloud)

$ python3 -c 'from eventlet.green import socket ; print(socket.getaddrinfo("ms-fe-02.swift.eqiad1.wikimedia.cloud", 11211, socket.AF_UNSPEC, socket.SOCK_STREAM))'                                                                    
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 435, in resolve
    return _proxy.query(name, rdtype, raise_on_no_answer=raises,
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 391, in query
    return end()
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 370, in end
    raise result[1]
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 351, in step
    a = fun(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1089, in query
    return self.resolve(qname, rdtype, rdclass, tcp, source,
  File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1043, in resolve
    timeout = self._compute_timeout(start, lifetime)
  File "/usr/lib/python3/dist-packages/dns/resolver.py", line 950, in _compute_timeout
    raise Timeout(timeout=duration)
dns.exception.Timeout: The DNS operation timed out after 5.107318878173828 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 528, in getaddrinfo
    qname, addrs = _getaddrinfo_lookup(host, family, flags)
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 501, in _getaddrinfo_lookup
    raise err
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 490, in _getaddrinfo_lookup
    answer = resolve(host, qfamily, False, use_network=use_network)
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 443, in resolve
    raise EAI_EAGAIN_ERROR
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 490, in _getaddrinfo_lookup
    answer = resolve(host, qfamily, False, use_network=use_network)
  File "/usr/lib/python3/dist-packages/eventlet/support/greendns.py", line 443, in resolve
    raise EAI_EAGAIN_ERROR
socket.gaierror: [Errno -3] Lookup timed out

Whereas non-eventlet works as expected

$ python3 -c 'import socket ; print(socket.getaddrinfo("ms-fe-02.swift.eqiad1.wikimedia.cloud", 11211, socket.AF_UNSPEC, socket.SOCK_STREAM))'                                                                                        
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.16.3.119', 11211))]

Event Timeline

Relevant Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=971530 apparently and incompatibility with dnspython >= 2.0 and eventlet; eventlet upstream issue https://github.com/eventlet/eventlet/issues/619

+SRE for visibility as this will be true in production too

fgiunchedi renamed this task from Python 3's eventlet.green getaddrinfo timeout in Cloud VPS + Bullseye to Python 3's eventlet.green getaddrinfo timeout in Bullseye.May 26 2021, 2:53 PM
Marostegui triaged this task as Medium priority.May 26 2021, 3:40 PM

Change 704777 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] swift: use addresses for memcached

https://gerrit.wikimedia.org/r/704777

Change 704777 merged by Filippo Giunchedi:

[operations/puppet@production] swift: use addresses for memcached

https://gerrit.wikimedia.org/r/704777

Mentioned in SAL (#wikimedia-operations) [2021-08-24T08:01:07Z] <godog> temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - T283714

Mentioned in SAL (#wikimedia-operations) [2021-08-24T12:33:33Z] <godog> test patched python3-eventlet on thanos-fe1003 - T283714

I was able to get a working python3-eventlet package by integrating upstream PR, the easy solution for now IMHO is to upload the package internally for Bullseye.

Change 715199 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/debs/python-eventlet@debian/bullseye] Fix dnspython 2 compat

https://gerrit.wikimedia.org/r/715199

Change 715199 merged by Filippo Giunchedi:

[operations/debs/python-eventlet@debian/bullseye] Fix dnspython 2 compat

https://gerrit.wikimedia.org/r/715199

fgiunchedi claimed this task.

Package uploaded and upgraded on thanos-fe hosts, resolving

I was able to get a working python3-eventlet package by integrating upstream PR, the easy solution for now IMHO is to upload the package internally for Bullseye.

This has now been proposed for a Bullseye point release: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994064