Page MenuHomePhabricator

logstash insertions to ElasticSearch cross-DC functionality needs figuring out
Closed, ResolvedPublic

Description

There is currently a plan in the works for multi-dc oriented search. This is primarily tracked in T105703: Set up a CirrusSearch cluster in codfw (Dallas, Texas) The plan as it stands now is to use the jobqueue system to update two unrelated clusters in order to maintain parity. Roughly speaking we plan on using our newfound redundancy to tweak various parameters in ES and to allow upgrades to be less user impacting. This is all related to a larger RFC as well https://phabricator.wikimedia.org/T88666.

The apifeatureusage extension seems to bypass the normal processing pipeline for Elasticsearch events which in this case will mean it cannot survive the multi-dc mode. At the moment, during maintenance it is a generally understood process to leave only apifeatureusage in write mode for this reason.

Event Timeline

chasemp renamed this task from apifeatureusage needs to go through the jobqueue system to apifeatureusage needs to go through the jobqueue system (or something...but probably better not to have parallel multisite mechanisms).
chasemp raised the priority of this task from to Medium.
chasemp updated the task description. (Show Details)
chasemp set Security to None.
chasemp added subscribers: Matanya, EBernhardson, Krenair and 3 others.

If you're talking about how the log data gets into ES, I don't think this has anything directly to do with ApiFeatureUsage as the extension itself is just an interface to read the data. The data gets inserted via logstash, in which case @bd808 might be able to help.

The apifeatureusage index is populated by the Logstash cluster using filter-apifeatureusage.conf and role::logstash::apifeatureusage. This is not CirrusSearch functionality and is only stored in the Elasticsearch cluster used by CirrusSearch as it is content intended to by used from Wikimedia wikis. The Logstash Elasticsearch cluster contains data that should not be exposed to the wikis so we came up with this option.

It would be great to make this work with multi-dc failover, but I don't think it should be considered a blocker for multi-dc CirrusSearch work. In the spirit of parallel builds via the job queue, Logstash could push these records to both the eqiad and codfw clusters. Logstash is only available in eqiad so true multi-dc failover won't be achieved by this but that in and of itself is an orthogonal problem.

Thanks @bd808!, I defer to whoever uses the data here (which I'm not even sure of atm) on the priority of this. No intention of proposing this as a multi-dc blocker or anything but atm this is a standing one-off. Can we stand to lose hours or days of this data on a regular basis? Honestly, I don't know, but I believe this will begin to happen as we move to using two clusters within the next few months (hopefully), and this mechanism is left behind.

@chasemp and I chatted about this a bit on irc. We really didn't come to any strong conclusion but I think we both understand the problem space a bit better.

CirrusSearch is going to be updated to duplicate/replicate indexing jobs to the codfw datacenter to keep the Elasticsearch cluster there "replicated" and ready to answer queries either for master site failovers or possibly for "read-only" DC usage. In planning for this it was (re)discovered that https://en.wikipedia.org/wiki/Special:ApiFeatureUsage uses a set of indexes that are not maintained by the jobqueue. This means that this non-CirrusSearch search feature does not have a multi-dc replication strategy.

The easy fix today would be to update the Logstash puppet configuration to add another output::elasticsearch configuration to ship the sanitized log events from the eqiad Logstash cluster to the codfw Elasticsearch cluster in addition to the current same DC shipping. This is a "one of these things is not like the others" scenario and a potential source of issues when additional changes are made in the Elasticsearch cluster configurations or additional data centers are added.

Anomie renamed this task from apifeatureusage needs to go through the jobqueue system (or something...but probably better not to have parallel multisite mechanisms) to logstash insertions to ElasticSearch cross-DC functionality needs figuring out.Aug 18 2015, 5:47 PM
Anomie added a project: Wikimedia-Logstash.
Gehel claimed this task.
Gehel subscribed.

This has been fixed already in T176430. Closing.