Page MenuHomePhabricator

[L] Refactor data pipeline queries
Closed, ResolvedPublic

Description

User story

As a developer of data pipelines, I need to facilitate unit tests and maximize code readability.

Tasks

Follow the proposed skeleton to:

  • convert SQL strings into pyspark functions
  • implement unit tests
  • get rid of queries.py

Event Timeline

CBogen renamed this task from Refactor data pipeline queries to [L] Refactor data pipeline queries.Mar 23 2022, 4:45 PM

When estimating this ticket, we have decided to convert all non-trivial SQL strings, i.e., those that hold some computation.
Computation should be broken down into atomic functions that can be tested, while e.g., simple selections can still live in strings.

It's also worth to note that it seems impossible to cleanly convert the very first query: it would actually require an additional workaround, so it does not make sense.

mfossati changed the task status from Open to In Progress.Apr 7 2022, 7:54 AM
mfossati claimed this task.
Cparle subscribed.

I think all remaining refactoring work has been done, so closing