Page MenuHomePhabricator

Netflow data pipeline
Closed, ResolvedPublic

Description

This task is meant to be a generic parent to track all the work done by Infra Foundations / Analytics to ingest/process/publish Netflow data.

The overall data flow is the following:

  • The data is sent from the routers to pmacct in each datacenter, and then forwarded to a kafka topic in Kafka-Jumbo (via TLS).
  • The netflow topic is periodically pulled onto HDFS by Camus. We call this "raw" data.
  • The data is then "refined" (where we can apply changes / etc.) to a new dataset that is also exposed via Hive.
  • The refined data is indexed periodically to Druid, where it can be queried from Turnilo/Superset/etc..

Event Timeline

elukey triaged this task as Medium priority.Jul 9 2020, 10:12 AM
elukey created this task.
odimitrijevic renamed this task from Neflow data pipeline to Netflow data pipeline.Oct 27 2021, 10:24 PM
odimitrijevic removed a project: Analytics-Kanban.
odimitrijevic edited subscribers, added: odimitrijevic; removed: Nuria.
odimitrijevic claimed this task.