Page MenuHomePhabricator

[Data Quality] Log selected Spark metrics and visualize on dashboard
Open, Needs TriagePublic

Description

As a follow up task to
https://phabricator.wikimedia.org/T297231

Implement logging of selected Spark metrics and visualize on Dashboard.

Event Timeline

After all the discussion on this subject, I also think that publishing Spark metrics to Kafka (then exported to hdfs) seems like the most obvious first step.

An example of a KafakSink is here: https://github.com/erikerlandson/spark-kafka-sink/blob/master/src/main/scala/org/apache/spark/metrics/sink/KafkaSink.scala

Ahoelzl renamed this task from [Data Quality] Log Spark metrics and visualize on dashboard to [Data Quality] Log selected Spark metrics and visualize on dashboard.Nov 13 2023, 6:57 PM
Ahoelzl edited projects, added Data-Engineering; removed Data-Engineering (Sprint 5).
Ahoelzl updated the task description. (Show Details)