Page MenuHomePhabricator

Closed, ResolvedPublic


We are trying to stream Push-Notification-Service logs to elasticsearch, and we initially were able to

But there are no logs after the 4th of Sept, even though the application is logging.

We found that elasticsearch has been unable to index:

[2020-09-04T14:16:52,000][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2020.09.04", :_type=>"fcm_send_failed", :_routing=>nil}, 2020-09-04T14:16:49.719Z kubestage1002 500: fcm_send_failed], :response=>{"index"=>{"_index"=>"logstash-2020.09.04", "_type"=>"fcm_send_failed", "_id"=>"AXRZeiZoNoG2jwpw1wTA", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"[detail] is defined as an object in mapping [fcm_send_failed] but this name is already used for a field in other types"}}}}

How can we fix it? Thank you!

Event Timeline

jijiki triaged this task as Medium priority.Sep 9 2020, 3:18 PM
jijiki created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Translated in english, the message [detail] is defined as an object in mapping [fcm_send_failed] but this name is already used for a field in other types" essentially means that detail is being sent as a different type (an object in this case) than previously defined/seen field of the same name (i.e. detail as a string).

This is unfortunately known and one of the current limitations we have with json logs from disparate producers when field types clash (an instance/manifestation of this problem T180051). A short term mitigation would be to drop the detail object altogether (i.e. "pull" its members up one level of nesting) or rename the field or indeed push it as a string if that makes sense


Is this resolved? It looks like the last occurrences of this error happened on 9/9, and logs seem to be coming through fine now.

MSantos claimed this task.
MSantos added a subscriber: MSantos.

I'm going to be bold and close this task because it looks resolved and we are successfully monitoring the service metrics at Please, reopen if I'm missing something.