Page MenuHomePhabricator

Update DAGs and test included articles in chunks
Closed, ResolvedPublic8 Estimated Story Points

Description

In order make sure that all the chunks (of a snapshot) altogether contain all the articles in the snapshot, we need to do some testing.

To Do

  • In scheduler repo, update protos submodule. Generate python gRPC.
  • Update snapshots DAG, ExportRequest with enable_chunking arg true.
  • Update batches and structured-snapshot DAG, ExportRequest with enable_chunking arg false.
  • From scheduler, run snapshot for a couple of smaller projects. This should produce snapshot as well as chunks.
  • Compare the articles in the snapshot vrs. the articles in all the chunks for this snapshot. They should be the same. Take a look at wikimedia-enterprise/experiments/snapshots for inspiration on snapshot testing.
  • Run a batches job and a structured-snapshot job from scheduler. Verify that the batches and structured-snapshot are getting created as usual. No chunks are generated for these.

Acceptance criteria / QA

  • The articles in a snapshot and all the chunks (for this snapshot) are the same.
  • The DAGs for batches, structured-snapshot are working as usual

Event Timeline

JArguello-WMF set the point value for this task to 5.Aug 1 2024, 1:08 PM
JArguello-WMF changed the point value for this task from 5 to 8.Aug 20 2024, 1:06 PM