Page MenuHomePhabricator

Rework the data flow between logstash and cirrus elasticsearch cluster for ApiFeatureUsage
Open, MediumPublic

Description

The current flow for ApiFeatureUsage is that usage logs are collected via logstash, which has an output to the cirrus elasticsearch clusters. This causes multiple issues:

  • synchronous flow: if the cirrus cluster is down for maintenance (or crashed) logstash pipeline will stall (see T176335)
  • strong coupling: logstash and the cirrus cluster need to run compatible versions of logstash / elasticsearch, which can be problematic during upgrades

We should rework this data flow, probably using kafka, which would take care of both those issues.

Event Timeline

Gehel created this task.Mar 6 2019, 10:17 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 6 2019, 10:17 AM
Gehel updated the task description. (Show Details)Mar 6 2019, 10:23 AM
Anomie added a subscriber: Anomie.Mar 6 2019, 2:06 PM

A couple random thoughts:

  • This could potentially be a part of T185233?
  • Another potentially related component could be mjolnir-bulk-daemon which is used today to take bulk updates from kafka and update the wiki indices
  • Alternatively ApiFeatureUsage could have it's own logstash instance running in ganetti which would reduce the coupling between the logging services and the cirrus services, but adds complications to puppet to handle multiple versions.
EBernhardson triaged this task as Medium priority.Mar 7 2019, 6:18 PM

@Gehel, @EBernhardson mentioned that our new Elasticsearch cluster version doesn't have the same issue with data replication when upgrading the cluster, which means that stopping writes might be less important in the next upgrade.

I think specifically the updates are around this ticket, https://phabricator.wikimedia.org/T235833

Mstyles claimed this task.Dec 18 2019, 9:29 PM

Tabling this for now as it's not urgent

Mstyles removed Mstyles as the assignee of this task.Aug 6 2020, 7:37 PM
Gehel added a comment.Tue, Aug 25, 7:22 PM

Adding kafka between logstash and cirrus search seems to be the easy solution that solves our biggest concerns.

We discussed this and we think sticking kafka between logstash and elasticsearch will help improve the synchronous flow and thus will give us a good value:effort tradeoff.

In the future we could look into cutting the logstash dependency entirely by having the analytics cluster parse web request logs to achieve the same effect, but for now let's have the scope of this ticket be just sticking kafka in there.