Page MenuHomePhabricator

Define and Implement CI Checks
Closed, ResolvedPublic8 Estimated Story Points


User Story
As a platform engineer, I need to implement CI checks so that when a dataset producer submits code for review I can automate a lot of basic checks
Success Criteria
  • CI checks implemented for Pylint, Unit tests (80%+) (needs team to define and groom)
Current status

The Draft PR at adds a linting step to our build tooling.

IMPORTANT: Work in progress! The following needs review and decision making:
Running tests and lint

test and linting steps can be executed manually with the following commands:

  • make test
  • make lint

By default both steps are executed inside a docker container. Native execution, provided that an Anaconda python distribution is available on the host, can be triggered with:

  • make test SKIP_DOCKER=true
  • make lint SKIP_DOCKER=true

Gitlab Pipelines are currently unavailable in Wikimedia's instance.

To automate CI, for demo purposes, I mirrored this repo to Github, and execute tests on a Github runner
./github/workflow/build.yml implements a CI workflow that runs on every push.

Output is available at

Moving forward, I'd like for us to rollback to using Gitlab CI as soon as Pipelines are re-enabled (see This will require providing an ad hoc (python+jvm) internal docker image.

In the short term, we have a couple of options:

  • Delegate CI build and reporting to Github as demoed in this task. This is annoying, because it breaks integration with the development flow (e.g. we can't make MRs conditional to successful builds).
  • @hnowlan suggested we could move our build step to Jenkings. AFAIK this will require providing an ad hoc (python+jvm) docker image.
  • Drop CI automation and run tests manually.

The linting step currently treats errors as warnings:
the build won't stop when failures are detected. We lint with flake8 and the following (conservative) settings:

  • McCabe complexity threshold: 10
  • maximum allowed line length: 127 (Default PEP8: 79)
  • check for syntax errors or undefined names

Our tests are implemented as pytest suites. Coverage is reported with the pytest-cov plugin.

---------- coverage: platform linux, python 3.7.11-final-0 -----------
Name                           Stmts   Miss  Cover
spark/                  0      0   100%
spark/      12      0   100%
spark/              21     21     0%
spark/                    7      0   100%
spark/             22     22     0%
spark/                42     17    60%
TOTAL                            104     60    42%

======================== 2 passed, 9 warnings in 9.05s =========================


Due Date
Nov 9 2021, 5:00 AM

Event Timeline

lbowmaker set Due Date to Nov 9 2021, 5:00 AM.
lbowmaker set the point value for this task to 8.

As part of this task, I would like to mirror the platforms-airflow-dags repo to Github, and add a Github Action workflow for CI.

While we currently rely on for code review and artifacts publication, the lack of publicly available runners means we current cannot automate CI.
I don't think it's too much overhead (other projects mirror gerrit -> github), and all logic for checks (linting, pytest, mypy etc) is currently encapsulated in a Makefile.
We can hook them into GH/Gitlab (and if needed gerrit) without too much code duplication.

@gmodena: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!