Page MenuHomePhabricator

[Hypothesis] WE1.5.1 Contributor metrics dashboard
Closed, ResolvedPublic

Description

If we implement a dashboard to explore 7 contributor metrics and standardize the calculation of at least one metric using dbt then we can enable contributor product teams to self-serve metric insights and develop a standard for storing metric calculation logic.

MVP List of metrics:

  1. Topline: Retained Editors
  2. Indicator: Constructive Activation
  3. Indicator: Constructive Edits
  4. Movement Health: Account Registrations
  5. Movement Health: Retained Newcomers
  6. Active Editors by tenure
    1. Movement Health: Returning Active Editors
    2. Movement Health: New Active Editors
  7. Active editors by experience level:
    1. Movement Health: Junior Active Editors
    2. Movement Health: Experienced Active Editors
    3. Movement Health: Very Experienced Active Editors

In addition to the raw numbers above, we are also going to include rate metrics

  • Constructive Activation rate
  • Constructive Edit rate

Related Objects

StatusSubtypeAssignedTask
OpenMayakp.wiki
ResolvedMayakp.wiki
ResolvedMayakp.wiki
ResolvedMayakp.wiki
Openamastilovic
ResolvedJMonton-WMF
ResolvedAhoelzl
ResolvedBTullis
DuplicateNone
ResolvedNone
ResolvedJMonton-WMF
ResolvedJMonton-WMF
ResolvedJMonton-WMF
ResolvedJMonton-WMF
Openamastilovic
Resolvedamastilovic
Resolvedamastilovic
ResolvedJMonton-WMF
OpenNone
OpenNone
ResolvedMayakp.wiki
OpenNone

Event Timeline

Going forward we will get updates from DPE Team on this task about progress on dbt - the new tool we are using to develop a standard for storing metric calculation logic. I will add these updates to my Asana updates every Friday.

Weekly update from the Data Engineering team:

  • Progress:
    • We evaluated ways of running dbt from Airflow and we have now a clear path forward.
    • We've built a new docker image that can run dbt from Airflow.
    • We can pass Airflow parameters to dbt directly if needed.
  • Blockers/Risk:
    • We are waiting for the Data Platform SRE team to provide a way of running Spark with dbt in Airflow. They are already working on it.

Weekly update from the Data Engineering team:

  • The dbt docker image is published and ready to be used from Airflow.
  • Great progress from the SRE team on how to run dbt + Spark from K8s, but still working on it.

Weekly update from the Data Engineering team:

The SRE team is still working on the way of running dbt + Spark from K8s. It seems to be closer to be finished.

Once the SRE team finishes this, we'll be able to start testing dbt from Airflow.

Weekly update from the Data Engineering team:

The SRE team is still working on dbt + Spark from K8s. Progress has been made, Spark is now running, but still having issues accessing the Production data.

Weekly update from the Data Engineering team:

The SRE team has been able to run Spark on K8s, but they are missing one last piece, the "Thrift server", which will allow dbt to be run properly. They are working on it.

Weekly update from the Data Engineering team:

The SRE team has been working on running Spark on K8s, but some issues are still appearing. While that is still investigated, we have decided to run dbt using Skein. This is a solution that the Data Engineering team can work on, and we believe it should work, as we have other applications running with this approach.
For now, the DE team is working on implement dbt+Skein for Airflow.

Weekly update from the Data Engineering team:

We are working on using dbt + Skein from Airflow and it's working well, it should be finished soon.

For now:

  • MRs are open to copy the dbt-jobs repository to HDFS (so Airflow can use it)
  • Draft MRs are already open in Airflow that should run dbt. We need to polish them a bit.

Weekly update from the Data Engineering team:

  • dbt-jobs repository is now automatically moved to HDFS.
  • We are testing a new dbt monthly Airflow DAG in our test instances.

We still need to do small changes and deploy it to Prod, but we are doing progress on it.

OSefu-WMF subscribed.

Completed - Final report here

Weekly update from the Data Engineering team:

Progress:

  • We've published the 1.0 version of Airflow DbtSkeinOperator to our Airflow library - this is the most basic building block that allows us to run dbt models from within Airflow DAGs. In combination with automatic publishing of the dbt-jobs repository to HDFS cache, we now have the capability of doing dbt runs on models published in dbt-jobs.

Next steps:

  • The DbtSkeinOperator was tested on a fairly simple dbt model that reads from/writes to Hive tables. We want to perform tests on dbt models using Iceberg tables too.
  • After that we will be ready to build and deploy a production Airflow DAG that runs a dbt model of choice.