Page MenuHomePhabricator

Create more sophisticated monitors for the various dump types
Open, MediumPublic

Description

We already get alerted if the DAG fails (or if a single task fails and reaches its max retry setting)
This is the standard Airflow monitoring framework, which is already in place.

We have several (potentially complementary) options in terms of more sophisticated monitoring:

  • we could determine an SLA for each global dump DAG (probably differentiating full and partial dumps)
  • we could also define a DAG that runs a sensor for each wiki, making sure that various dump files made it to the clouddumps servers and are publicly visible on the internet
  • we could set up a DAG that performs an end-to-end dump of a wiki and checks daily whether the whole pipeline functions correctly.

There is also considerable cross-over with T343234: Explore the use of Airflow notifiers for more flexible DAG failure handling and T356416: [M] Consider improving e-mail alerts sent by Airflow DAGs

Event Timeline

brouberol triaged this task as Medium priority.
BTullis renamed this task from Monitor the dumps to Create more sophiticated monitors the various dump types.Jul 2 2025, 12:32 PM
BTullis updated the task description. (Show Details)
BTullis renamed this task from Create more sophiticated monitors the various dump types to Create more sophisticated monitors for the various dump types.Jul 2 2025, 12:46 PM