Page MenuHomePhabricator

illegal_argument_exception
Closed, ResolvedPublic

Description

We are trying to stream Push-Notification-Service logs to elasticsearch, and we initially were able to

https://logstash-next.wikimedia.org/goto/02c8f00c4a4d36524e74d0f47b4796ef

But there are no logs after the 4th of Sept, even though the application is logging.

We found that elasticsearch has been unable to index:

https://logstash-next.wikimedia.org/goto/7ca65191c2e4fe0089d56c4353596e9f

[2020-09-04T14:16:52,000][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2020.09.04", :_type=>"fcm_send_failed", :_routing=>nil}, 2020-09-04T14:16:49.719Z kubestage1002 500: fcm_send_failed], :response=>{"index"=>{"_index"=>"logstash-2020.09.04", "_type"=>"fcm_send_failed", "_id"=>"AXRZeiZoNoG2jwpw1wTA", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"[detail] is defined as an object in mapping [fcm_send_failed] but this name is already used for a field in other types"}}}}

How can we fix it? Thank you!

Event Timeline

jijiki triaged this task as Medium priority.Sep 9 2020, 3:18 PM
jijiki created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Translated in english, the message [detail] is defined as an object in mapping [fcm_send_failed] but this name is already used for a field in other types" essentially means that detail is being sent as a different type (an object in this case) than previously defined/seen field of the same name (i.e. detail as a string).

This is unfortunately known and one of the current limitations we have with json logs from disparate producers when field types clash (an instance/manifestation of this problem T180051). A short term mitigation would be to drop the detail object altogether (i.e. "pull" its members up one level of nesting) or rename the field or indeed push it as a string if that makes sense

HTH!

Is this resolved? It looks like the last occurrences of this error happened on 9/9, and logs seem to be coming through fine now.

MSantos claimed this task.
MSantos added a subscriber: MSantos.

I'm going to be bold and close this task because it looks resolved and we are successfully monitoring the service metrics at https://grafana.wikimedia.org/d/NQO_pqvMk/push-notifications?orgId=1&refresh=1m. Please, reopen if I'm missing something.