This is a placeholder meant to collect feedback / discussion I had with @mfossati on Slack and Gitlab.
For context; we use a cookiecutter template to scaffold python based datapipelines. Among other CI checks, it configures a set of flake8 rules to test for:
https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/blob/main/datapipeline-scaffold/%7B%7Bcookiecutter.pipeline_directory%7D%7D/pyspark/setup.cfg
This is a lax set of rules @Clarakosi and I arbitrarily picked as a baseline.
There's two areas of improvement.
Add linting rules
For new projects we should be more strict and include (at least):
- E501: line too long
- E202: un-ncecessary whitespaces
- F401: unused imports
- W292: no newline at end of file
Code formatting and style enforcement
@mfossati suggests integrating isort and autoflake in CI. +1 as far as I'm concerned.
We also discussed having formatting (and maybe other checks) as a git pre-commit hook. @Marco suggested evaluating https://pre-commit.com/.