I just noticed that running pip install on the airflow-dags repository sometimes takes forever. One of the problems is that pip apparently has to guess which version of a dependency is the right one, so it downloads like... all of them? See below for the pyspark example, where it downloads a few dozen files, totaling over 7GB of useless downloads.
Collecting pyspark Downloading pyspark-3.2.0.tar.gz (281.3 MB) |████████████████████████████████| 281.3 MB 189 kB/s Collecting py4j==0.10.9.2 Downloading py4j-0.10.9.2-py2.py3-none-any.whl (198 kB) |████████████████████████████████| 198 kB 15.9 MB/s Collecting pyspark Downloading pyspark-3.1.3.tar.gz (214.0 MB) |████████████████████████████████| 214.0 MB 16.4 MB/s Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) |████████████████████████████████| 198 kB 12.5 MB/s Collecting pyspark Downloading pyspark-3.1.2.tar.gz (212.4 MB) |████████████████████████████████| 212.4 MB 28.7 MB/s Downloading pyspark-3.1.1.tar.gz (212.3 MB) |████████████████████████████████| 212.3 MB 76 kB/s Downloading pyspark-3.0.3.tar.gz (209.1 MB) |████████████████████████████████| 209.1 MB 24.0 MB/s Downloading pyspark-3.0.2.tar.gz (204.8 MB) |████████████████████████████████| 204.8 MB 36.6 MB/s Downloading pyspark-3.0.1.tar.gz (204.2 MB) |████████████████████████████████| 204.2 MB 134.0 MB/s INFO: pip is looking at multiple versions of py4j to determine which version is compatible with other requirements. This could take a while. INFO: pip is looking at multiple versions of pyspark to determine which version is compatible with other requirements. This could take a while. Downloading pyspark-3.0.0.tar.gz (204.7 MB) |████████████████████████████████| 204.7 MB 18.4 MB/s Downloading pyspark-2.4.8.tar.gz (220.5 MB) |████████████████████████████████| 220.5 MB 6.7 MB/s Collecting py4j==0.10.7 Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB) |████████████████████████████████| 197 kB 6.1 MB/s Collecting pyspark Downloading pyspark-2.4.7.tar.gz (217.9 MB) |████████████████████████████████| 217.9 MB 20.4 MB/s Downloading pyspark-2.4.6.tar.gz (218.4 MB) |████████████████████████████████| 218.4 MB 13.0 MB/s Downloading pyspark-2.4.5.tar.gz (217.8 MB) |████████████████████████████████| 217.8 MB 10.8 MB/s INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking Downloading pyspark-2.4.4.tar.gz (215.7 MB) |████████████████████████████████| 215.7 MB 9.8 MB/s Downloading pyspark-2.4.3.tar.gz (215.6 MB) |████████████████████████████████| 215.6 MB 27.8 MB/s Downloading pyspark-2.4.2.tar.gz (193.9 MB) |████████████████████████████████| 193.9 MB 21.3 MB/s Downloading pyspark-2.4.1.tar.gz (215.7 MB) |████████████████████████████████| 215.7 MB 189 kB/s Downloading pyspark-2.4.0.tar.gz (213.4 MB) |████████████████████████████████| 213.4 MB 815 kB/s Downloading pyspark-2.3.4.tar.gz (212.3 MB) |████████████████████████████████| 212.3 MB 10.9 MB/s Downloading pyspark-2.3.3.tar.gz (211.9 MB) |████████████████████████████████| 211.9 MB 40.2 MB/s Downloading pyspark-2.3.2.tar.gz (211.9 MB) |████████████████████████████████| 211.9 MB 79 kB/s Downloading pyspark-2.3.1.tar.gz (211.9 MB) |████████████████████████████████| 211.9 MB 255 kB/s Downloading pyspark-2.3.0.tar.gz (211.9 MB) |████████████████████████████████| 211.9 MB 12.6 MB/s Collecting py4j==0.10.6 Downloading py4j-0.10.6-py2.py3-none-any.whl (189 kB) |████████████████████████████████| 189 kB 17.7 MB/s Collecting pyspark Downloading pyspark-2.2.3.tar.gz (188.5 MB) |████████████████████████████████| 188.5 MB 120 kB/s Downloading pyspark-2.2.2.tar.gz (188.0 MB) |████████████████████████████████| 188.0 MB 6.1 MB/s Downloading pyspark-2.2.1.tar.gz (188.2 MB) |████████████████████████████████| 188.2 MB 56 kB/s Collecting py4j==0.10.4 Downloading py4j-0.10.4-py2.py3-none-any.whl (186 kB) |████████████████████████████████| 186 kB 14.5 MB/s Collecting pyspark Downloading pyspark-2.2.0.post0.tar.gz (188.3 MB) |████████████████████████████████| 188.3 MB 14.8 MB/s WARNING: Discarding https://files.pythonhosted.org/packages/f6/fe/4a1420f1c8c4df40cc8ac1dab6c833a3fe1986abf859135712d762100fde/pyspark-2.2.0.post0.tar.gz#sha256=9dc994118608ce12939d86dec27ce8a545cc6e6a4d76bca785a37322daa33a3c (from https://pypi.org/simple/pyspark/). Requested pyspark from https://files.pythonhosted.org/packages/f6/fe/4a1420f1c8c4df40cc8ac1dab6c833a3fe1986abf859135712d762100fde/pyspark-2.2.0.post0.tar.gz#sha256=9dc994118608ce12939d86dec27ce8a545cc6e6a4d76bca785a37322daa33a3c (from apache-airflow-providers-apache-spark->wmf-airflow-dags==0.1.0) has inconsistent version: filename has '2.2.0.post0', but metadata has '2.2.0' Downloading pyspark-2.1.3.tar.gz (181.3 MB) |████████████████████████████████| 181.3 MB 22.8 MB/s Downloading pyspark-2.1.2.tar.gz (181.3 MB) |████████████████████████████████| 181.3 MB 115 kB/s