Page MenuHomePhabricator

[spike] Correlate events between echo and push notifications service to extract metrics
Closed, DeclinedPublic

Description

The current flow for push notifications is:

  • An event is triggered on mediawiki
  • Echo handles the event and sends an API request to push notifications
  • Push notifications service queues the notification
  • At some point when requirements are empty, queue is flushed and notifications are sent to the push providers

We need to find a way to correlate the events across Echo and the push-notifications service to extract timings.

Event Timeline

Ideas

  1. Distributed tracing
    • Doesn't look like we have an established technology for that
  1. Correlating logs
    • Publish start/stop events from echo/push service
      • On echo job submission
        • Logstash event
          • Event type
          • Notification ID
          • Start timestamp
      • On push queue flush
        • Logstash event
          • Event type
          • Notification ID
          • End timestamp
    • Correlate events
      • Logstash scripted field (?) for elapsed time
        • TODO: Need to check if they are allowed
      • Use a visualization directly
        • Group by notification ID
        • Visualize p95/p99/average of elapsed time
    • Too much computation time for something questionable if it's that much needed in such a detail
    • Too many log entries
  1. Passing debug metadata
    • On echo job submission
      • Generate and pass an initial timestamp
    • On echo job execution
      • Calculate the current timediff and pass it to the API > request metadata
    • On push notification API request handling
      • Calculate the current timediff and pass it to the notification object
    • On queue flush
      • Expose it as a prometheus metric

It turns out it is getting too complicated to implement this and given that we have a lot of metric orchestration already it doesn't look like its a priority at this moment.