Page MenuHomePhabricator

📊 [Observability] Provide Data Pipeline Metrics
Closed, ResolvedPublic2 Estimated Story Points

Description

Narratives
  • As an SRE, I want the ability to observe the ImageMatchingAlgorithm data pipeline metrics, so that I know how well it is performing compared to defined SLOs
  • As a PET data engineer, I want the ability to observe the ImageMatchingAlgorithm data pipeline metrics, so that I can have a baseline to ensure we have optimized how our pipeline performs.
  • As an Analytics Engineer, I want the ability to observe the ImageMatchingAlgorithm data pipeline metrics, so that I can be aware of how the data infrastructure resources are being utilized.
Acceptance Criteria
  • As an PET Data Engineer, I want the ability to generate a csv file with the following metrics, so that I can have a baseline of how the pipeline performs.
    • in/out records
    • CPU Usage
    • Memory Usage
    • Executor Counts
    • Runtime and resource utilization for algo
Notes
  • This is the initial part of how we start to collect metrics. We will iterate so that we have designated stores for our collected metrics.
Subtasks

Event Timeline

sdkim renamed this task from [Observability] Provide Data Pipeline Metrics to 📊 [Observability] Provide Data Pipeline Metrics.Feb 18 2021, 9:44 PM
sdkim assigned this task to gmodena.
gmodena set the point value for this task to 2.Feb 22 2021, 2:57 PM

In this task we focused on enabling instrumentation for spark, and getting an understanding of how metrics are collected and presented on Hadoop.

There are portions of the Algo code than run locally (outside of spark), which we are currently not observing with fine granularity probes.