ERROR: Timeout connecting to https://toolhub.toolforge.org/toolinfo.json Traceback (most recent call last): File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connection.py", line 170, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/util/connection.py", line 96, in create_connection raise err File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/util/connection.py", line 86, in create_connection sock.connect(sa) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen chunked=chunked, File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn conn.connect() File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connection.py", line 353, in connect conn = self._new_conn() File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connection.py", line 177, in _new_conn % (self.host, self.timeout), urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fb389ddc5c0>, 'Connection to toolhub.toolforge.org timed out. (connect timeout=5)') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/adapters.py", line 449, in send timeout=timeout File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/urllib3/util/retry.py", line 574, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='toolhub.toolforge.org', port=443): Max retries exceeded with url: /toolinfo.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fb389ddc5c0>, 'Connection to toolhub.toolforge.org timed out. (connect timeout=5)')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/srv/app/toolhub/apps/crawler/tasks.py", line 178, in fetch_content timeout=(5, 13), File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/requests/adapters.py", line 504, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='toolhub.toolforge.org', port=443): Max retries exceeded with url: /toolinfo.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fb389ddc5c0>, 'Connection to toolhub.toolforge.org timed out. (connect timeout=5)')) ERROR: Failed to fetch https://toolhub.toolforge.org/toolinfo.json: Connect Timeout
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T292861 Find a better solution than `concurrencyPolicy: Replace` for sidecars in CronJob | |||
Resolved | BUG REPORT | bd808 | T292027 Crawler unable to reach https://toolhub.toolforge.org/toolinfo.json from eqiad k8s cluster |
Event Timeline
The *.toolforge.org ingress is not behind the text-lb CDN edge, so this should be attempting to route through the url-downloader proxy. Having an environment with matching network restrictions to test things from (T290357: Maintenance environment needed for running one-off commands) would be helpful for working out what is really going wrong here.
This really looks like a failure to use http://url-downloader.eqiad.wikimedia.org:8080 as a proxy.
Change 724851 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[wikimedia/toolhub@main] crawler: set explicit proxy configuration
Change 724851 merged by jenkins-bot:
[wikimedia/toolhub@main] crawler: set explicit proxy configuration
Change 724859 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Bump container version to 2021-09-29-223524-production
Change 724859 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Bump container version to 2021-09-29-223524-production
Change 725060 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Do not force cronjob envvars to uppercase
Change 725060 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Do not force cronjob envvars to uppercase
Change 725180 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: set https_proxy envvar
Change 725181 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Bump container version to 021-10-01-024845-production
Change 725180 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: set https_proxy envvar
Change 725181 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Bump container version to 021-10-01-024845-production
Change 725376 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Add \"localhost\" to no_proxy envvar
Change 725376 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Add \"localhost\" to no_proxy envvar
Change 725384 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Add envoy and mcrouter sidecars to cronjob
I now have the eqiad deployment configured to try to crawl 4 different URLs to try and get a better picture of what works and what fails:
- https://toolhub.toolforge.org/toolinfo.json -- TLS, Cloud VPS hosted, expected to use url-downloader
- https://gist.githubusercontent.com/Krinkle/081662d4d2fb390a9716/raw/toolinfo.json -- TLS, open internet, expected to use url-downloader
- http://wnews.ist.hokudai.ac.jp/wc3/toolinfo.json -- non TLS, open internet, expected to use url-downloader
- https://wikitech.wikimedia.org/w/index.php?title=User:Magnus_Manske/hay_directory&action=raw -- TLS, content wiki, expected to route directly to text-lb
Runs are still not completing, but I do at least keep getting slightly different errors as I continue to try and find the root problem.
Change 725384 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Add envoy and mcrouter sidecars to cronjob
Change 725428 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Set CronJob's backoffLimit back to 1
Change 725428 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Set CronJob's backoffLimit back to 1
Change 725430 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/deployment-charts@master] toolhub: Set concurrencyPolicy=Replace for CronJob
Change 725430 merged by jenkins-bot:
[operations/deployment-charts@master] toolhub: Set concurrencyPolicy=Replace for CronJob
Lots of things were wrong from the start of this task. We needed to set https_proxy in the environment, add the envoy and mcrouter side cars, add 'localhost' to the no_proxy exception list, and tell Kubernetes that it was ok to replace the prior job's pod with a new one when the schedule trips (workaround for sidecars not knowing to terminate with the main container).