==== **User Story**
> ==== As a platform engineer, I need to implement CI checks so that when a dataset producer submits code for review I can automate a lot of basic checks
==== Success Criteria
[] CI checks implemented for Pylint, Unit tests (80%+) (needs team to define and groom)
==== Current status
The Draft PR at https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/8 adds a linting step to our build tooling.
IMPORTANT: Work in progress! The following needs review and decision making:
* How do we automate CI? See Automation below
* Are lint and test constraints provided acceptable? See *linting* and *tests* sections below.
* Could you provide feedback on the sister work items at https://phabricator.wikimedia.org/T293382#7470865?
====== Running tests and lint
test and linting steps can be executed manually with the following commands:
* `make test`
* `make lint`
By default both steps are executed inside a docker container. Native execution, provided that an [Anaconda](https://www.anaconda.com/) python distribution is available on the host, can be triggered with:
* `make test SKIP_DOCKER=true`
* `make lint SKIP_DOCKER=true`
====== Automation
Gitlab Pipelines are currently unavailable in Wikimedia's instance.
To automate CI, for demo purposes, I mirrored this repo to Github, and execute tests on a Github runner
./github/workflow/build.yml implements a CI workflow that runs on every push.
Output is available at https://github.com/gmodena/wmf-platform-airflow-dags/actions/workflows/build.yml?query=branch%3AT292741-implement-ci-checks.
Moving forward, I'd like for us to rollback to using Gitlab CI as soon as Pipelines are re-enabled. This will require providing an ad hoc (python+jvm) internal docker image.
In the short term, we have a couple of options:
* Delegate CI build and reporting to Github as demoed in this task. This is annoying, because it breaks integration with the development flow (e.g. we can't make MRs conditional to successful builds).
* @hnowlan suggested we could move our build step to Jenkings. AFAIK this will require providing an ad hoc (python+jvm) docker image.
* Drop CI automation and run tests manually.
====== linting
The linting step currently treats errors as warnings:
the build won't stop when failures are detected. We lint with flake8 and the following (conservative) settings:
* McCabe complexity threshold: 10
* maximum allowed line length: 127 (Default PEP8: 79)
* check for syntax errors or undefined names
======= tests
Our tests are implemented as pytest suites. Coverage is reported with the `pytest-cov` plugin.
```
---------- coverage: platform linux, python 3.7.11-final-0 -----------
Name Stmts Miss Cover
--------------------------------------------------
spark/__init__.py 0 0 100%
spark/instances_to_filter.py 12 0 100%
spark/raw2parquet.py 21 21 0%
spark/schema.py 7 0 100%
spark/search_table.py 22 22 0%
spark/transform.py 42 17 60%
--------------------------------------------------
TOTAL 104 60 42%
======================== 2 passed, 9 warnings in 9.05s =========================
```