Page MenuHomePhabricator

Run Dumps 2.0 main DAG at a daily cadence rather than hourly.
Closed, ResolvedPublic

Description

Sad news:

Even though, as per T375402#10239416, the performance of the hourly ingest seemed to be great on my testing, after merging all changes into the production table, we were not able to reproduce the performance benefits.

The revision level MERGE INTO continues to take way more time than the allotted max of 1 hour.

At this time, I am throwing the towel. There are more things to look into, like figuring out why I was not able to reproduce the gains, but there is a lot of other work to be done for Dumps 2.0 that needs attention. Thus, I think it is best, in the interest of time, to rest this work and bite the bullet: we will have to do consume at a daily cadence rather than hourly.

Event Timeline

xcollazo changed the task status from Open to In Progress.Oct 23 2024, 6:25 PM
xcollazo moved this task from Incoming to Kanban Board on the Dumps 2.0 board.
xcollazo edited projects, added Dumps 2.0 (Kanban Board); removed Dumps 2.0.
xcollazo moved this task from Sprint Backlog to In Process on the Dumps 2.0 (Kanban Board) board.

Mentioned in SAL (#wikimedia-operations) [2024-10-24T15:45:18Z] <xcollazo@deploy2002> Started deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999.

Mentioned in SAL (#wikimedia-analytics) [2024-10-24T15:46:38Z] <xcollazo> Deploy latest DAGs to analytics Airflow instance. T377999.

Mentioned in SAL (#wikimedia-operations) [2024-10-24T15:47:12Z] <xcollazo@deploy2002> Finished deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999. (duration: 01m 07s)

xcollazo updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/894

Pickup 'Push down earliest rev_dt per wiki on the revision level MERGE INTO'

xcollazo merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/894

Pickup 'Push down earliest rev_dt per wiki on the revision level MERGE INTO'

xcollazo opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/906

Sync up DagProperties of dumps_merge_events_to_wikitext_raw_daily with overrides.

xcollazo merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/906

Sync up DagProperties of dumps_merge_events_to_wikitext_raw_daily with overrides.