Hello - question about Newpyter with stacked conda envs. The snippet below fetches an image from swift a couple times, using http and https with and without verfication. After starting a new server with a new conda env, the snipped prints three <Response [200]> when executed with a PySpark - Local kernel. With the default Python 3 kernel however, the last https call fails with a SSL certification error. What added to my confusion is that in a notebook (ie on a stat machine) a plain http call to swift succeeds, while on a yarn worker the http call times out and you need to use https (with verify=false). Does anybody have an intuition why that is?
import requests http_swift = "http://ms-fe.svc.eqiad.wmnet" https_swift = "https://ms-fe.svc.eqiad.wmnet" image_path = "/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/100px-Tour_Eiffel_Wikimedia_Commons.jpg" print(requests.get(http_swift+image_path)) print(requests.get(https_swift+image_path, verify=False)) print(requests.get(https_swift+image_path, verify=True))
SSLError: HTTPSConnectionPool(host='ms-fe.svc.eqiad.wmnet', port=443): Max retries exceeded with url: /wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/100px-Tour_Eiffel_Wikimedia_Commons.jpg (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))
It might have to do with how the pyspark kernels are started, ie. changing PYTHONPATH and creating the spark session at startup. For example, packages that are pip installed into a stacked conda env are not available in a pyspark kernel. As Newpyter is still work in progress, is the intention to have people create spark sessions manually in a notebook, or will we need to update the pyspark kernels to work with Newpyter?