Page MenuHomePhabricator

Dashboard for monitoring product data traffic
Open, MediumPublic

Description

Would like to have one dashboard where we can view

  • [exists?] Total event traffic
  • [exists?] Traffic to each EventGate instance
  • [exists] Traffic to Kafka
    • [exists] Top Kafka topics
    • [exists] Bottom (or dead) Kafka topics (for pruning or retiring streams/instrumentation)
  • [can't find] Total traffic to logstash
    • [can compute?] Contribution from Event traffic to logstash overlayed
  • [probably exists] Traffic to statsD / Prometheus
    • [can compute?] Contribution from Event traffic overlayed
  • [needs instrumentation] Timing
    • For an event to go from the client to EventGate
    • For an event to go from the client to Kafka
    • For an event to go from the client to HDFS
    • For an event to go from the client to logstash, etc.
  • [exists?] Errors
    • [exists?] Rejection fraction (validation failure)
    • [exists?] EventGate timeouts or non-200 statuses
    • [exists?] Kafka errors
    • [exists?] Insertion errors (into e.g. HDFS, logstash, etc.)

Nice dimensions to have would be:

  • [exists] Stream/Topic
  • [needs instrumentation] Platform
    • Desktop Web
    • Mobile web
    • iOS App
    • Android App
    • MediaWiki Server
    • Other (KaiOS, labs, etc.)
  • [needs instrumentation] Project (e.g. which wiki)

Most of this is pulling together existing numbers, except for the timing data. Each event has a timestamp on it already, so it should not be hard, all that is required is a datetime difference with this field taken at the point of measurement.

The project dimension could be computed using the hostname field or the HTTP header. The platform dim is harder and we may not be able to do it.

Event Timeline

jlinehan moved this task from Inbox to Task Backlog on the Product-Data-Infrastructure board.

Adding #Product-Infrastructure-Team-Backlog as Better Use Of Data/Product-Data-Infrastructure project tags got archived, so this open task has an active project tag and can be found.

Removing inactive assignee from this open task. (Please update assignees on open tasks after offboarding. Thanks.)