Page MenuHomePhabricator

Add integration tests to the PySpark jobs
Closed, ResolvedPublic5 Estimated Story Points

Description

We have been heads down developing the ingestion pipelines for the backfill , events, and visbility events.

Now that the code has started to stabilize, we should invest in proper testing.

In this task we should:

  • Figure out what is best these days to test PySpark jobs.
  • Implement tests to a significant coverage, let's target 80%, for the mediawiki-content-dump project.

Event Timeline

Perhaps @mfossati has PySpark testing framework recommendations?

Hi there, for unit tests I can certainly suggest chispa, which was initially proposed by @MunizaA .
For integration tests I can't tell.

  • Step 1: Figure out test are useful for pyspark jobs
  • Step 2: Decide on acceptable parameters for test
  • Step 3: Create/script tests on local
  • Step 4: Move test to prod- gitlab
JEbe-WMF set the point value for this task to 5.Oct 12 2023, 12:25 PM