We have been heads down developing the ingestion pipelines for the backfill , events, and visbility events.
Now that the code has started to stabilize, we should invest in proper testing.
In this task we should:
- Figure out what is best these days to test PySpark jobs.
- Implement tests to a significant coverage, let's target 80%, for the mediawiki-content-dump project.