User story
As a developer of data pipelines, I need to facilitate unit tests and maximize code readability.
Tasks
Follow the proposed skeleton to:
- convert SQL strings into pyspark functions
- implement unit tests
- get rid of queries.py
As a developer of data pipelines, I need to facilitate unit tests and maximize code readability.
Follow the proposed skeleton to:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | CBogen | T299781 [EPIC] Image suggestions backend | |||
Resolved | mfossati | T296814 [EPIC] Article-level image suggestions data pipeline | |||
Resolved | mfossati | T303816 [L] Refactor data pipeline queries |
When estimating this ticket, we have decided to convert all non-trivial SQL strings, i.e., those that hold some computation.
Computation should be broken down into atomic functions that can be tested, while e.g., simple selections can still live in strings.
It's also worth to note that it seems impossible to cleanly convert the very first query: it would actually require an additional workaround, so it does not make sense.