Page MenuHomePhabricator

Improve speed of Gitlab CI
Open, Needs TriagePublic

Description

The problem:
It takes too much time to get feedback from the GitLab CI pipelines. Developers are waiting for CI confirmations to merge or deploy. Also, not everybody set up a development environment to run the test suite locally. Thus they rely on the CI.

Example: data-engineering/airflow-dags is a python project, yet running the test suite (including linting) is taking something like 8 minutes. https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/pipelines

The CI is also critical for repositories in charge of building artifacts (e.g. https://gitlab.wikimedia.org/repos/data-engineering/conda-base-env ), where iteration over the CI is key.

Some propositions

  • Use custom Docker images. Currently, we are using WMF Debian. The ideal image may have:
    • already installed apt packages, conda
    • already set up a conda environment (Especially useful for big pip packages like Pyspark)
  • Tweak pytest configuration? (tests suit is running in 4s locally, and in 1.5min in CI)

Event Timeline

tests suit is running in 4s locally, and in 1.5min in CI

how is that possible? 😣

hashar added a subscriber: hashar.

+ GitLab since there is surely caching optimization that would need to be added. It looks like building the image takes a while.

Probably related, last week we had Gitlab runners with full disk and in my inspection I found out Gitlab keeps caching volumes between builds. The breakdown at T310593#8008684 shows caching volumes taking 1G to 1.8G. They were for the following repositories:

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags
https://gitlab.wikimedia.org/repos/data-engineering/conda-base-env
https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines
https://gitlab.wikimedia.org/repos/research/knowledge-gaps
https://gitlab.wikimedia.org/repos/research/research-common

I have nuked the caching volumes from some of the hosts which I guess is the reason for builds now taking longer time.

Gitlab doesn't give timestamp per output lines unfortunately . Some ideas:

$ sudo apt install moreutils
...
$ echo "hello" | ts "%Y-%m-%d %H:%M:%.S |"
2022-06-22 10:25:42.756864 | hello
$
  • You might have a faster installation by installing python dependencies from Debian package (sudo apt install python3-numpy) and the asking tox to use the OS provided packages with tox --sitepackages and potentially skip installing dependencies entirely solely relying on Debian packages (tox --sitepackages --skip-pkg-install) but that ties you to the versions provided by Debian and might prove difficult to reproduce the same environment locally.
  • I don't know much about Conda but it seems to be installing various system libraries in addition to the python modules. Maybe that is redundant with packages which could be installed via Debian. Then again, I don't know anything about Conda :D

The test suite at https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/jobs/21197 says it took almost 2 minutes rather than seconds: 56 passed, 15 warnings in 117.89s (0:01:57). I am guessing locally pytest has some optimizations to avoid rerunning tests.