Page MenuHomePhabricator

Run an experimental dump of 200 regular sized wikis
Closed, ResolvedPublic

Description

Now that we managed to organize our dumpsv1 DAG in a way that we feel comfortable with (cf T390852), we can now try to orchestrate the dumps of a larger number of wikis. We settled on the arbitrary number of 200.

NOTE: we will avoid mediawikiwiki, as it is currently failing (https://phabricator.wikimedia.org/T390839#10705423)

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedBTullis
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
ResolvedBTullis
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
ResolvedBTullis
ResolvedBTullis
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol

Event Timeline

brouberol triaged this task as Medium priority.

brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1212

test_k8s/dumpsv1: introduce a way to exclude certain wikis from a dag run

brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1214

Increase the max active tasks from 6 to 16, to speed up the DAG execution

Change #1134985 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow: set saner performance-related configs

https://gerrit.wikimedia.org/r/1134985

Change #1134985 merged by Brouberol:

[operations/deployment-charts@master] airflow: set saner performance-related configs

https://gerrit.wikimedia.org/r/1134985

Change #1135001 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow: scrape additional metrics

https://gerrit.wikimedia.org/r/1135001

Change #1135419 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow: increase pool metrics computation frequency

https://gerrit.wikimedia.org/r/1135419

Change #1135001 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: scrape additional metrics

https://gerrit.wikimedia.org/r/1135001

Change #1135419 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: increase pool metrics computation frequency

https://gerrit.wikimedia.org/r/1135419

I think that we can call this done.

image.png (728×1 px, 120 KB)

We have now split our DAGs so that we have around ~145 wikis per DAG and I had two runs of different groups complete successfully yesterday.