Page MenuHomePhabricator

Provide a dbt-core development environment and production setup in the data-platform
Open, HighPublic

Description

In support of "WE1.5.1 Contributor metrics dashboard" we want to enable dbt-core on the data platform for metrics and analyst teams and Data-Engineering.

A metrics engineer or analyst should be able to setup a development environment (e.g. on a stats machine) to develop and test new dbt based workflows, and publish them to run periodically scheduled on Airflow.

We will also want to connect it to various adapters, including: dbt-spark (preferred)

This is a parent ticket to track the various elements of work that we need to make this happen.

Event Timeline

GGoncalves-WMF renamed this task from Exlore the use of dbt-core and appropriate adapters in the data-platform environment to Explore the use of dbt-core and appropriate adapters in the data-platform environment.Oct 21 2025, 1:49 PM

Pasting here the slack comment I wrote yesterday:

Good progress made this evening: DBT runs with spark in session mode on the test cluster, querying data and writing data. There is lot more learning to be done on how to make DBT do what we expect it to do (partition creation, iceberg dates etc), but at least the basic infra is working!

Ahoelzl renamed this task from Explore the use of dbt-core and appropriate adapters in the data-platform environment to Provide dbt-core and appropriate adapters in the data-platform environment.Nov 18 2025, 11:25 PM
Ahoelzl reopened this task as Open.
Ahoelzl updated the task description. (Show Details)
Ahoelzl renamed this task from Provide dbt-core and appropriate adapters in the data-platform environment to Provide a dbt-core development environment and production setup in the data-platform.Nov 18 2025, 11:29 PM
Ahoelzl updated the task description. (Show Details)