Page MenuHomePhabricator

Indexing errors / malformed logs for aqs on cassandra timeout
Open, MediumPublic

Description

Noticed this due to high indexing failures reported by elasticsearch today, looks like the log is malformed and can't be parsed as valid json. Not sure if this is aqs-specific or related to some library.

[2020-09-15T12:52:43,315][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2020.09.15", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2e39b975>], :response=>{"index"=>{"_index"=>"logstash-2020.09.15", "_type"=>"_doc", "_id"=>"5SbTkXQBGIXcFGALEnhm", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [res.body] of type [text] in document with id '5SbTkXQBGIXcFGALEnhm'. Preview of field's value: '{stack=Error: Operation timed out - received only 1 responses.
    at ResponseError.DriverError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/errors.js:14:19)
    at new ResponseError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/errors.js:51:24)
    at FrameReader.readError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/readers.js:317:13)
    at Parser.parseBody (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/streams.js:194:66)
    at Parser._transform (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/streams.js:137:10)
    at Parser.Transform._read (_stream_transform.js:190:10)
    at Parser.Transform._write (_stream_transform.js:178:12)
    at doWrite (_stream_writable.js:410:12)
    at writeOrBuffer (_stream_writable.js:394:5)
    at Parser.Writable.write (_stream_writable.js:294:11)
    at Protocol.ondata (_stream_readable.js:667:20)
    at Protocol.emit (events.js:189:13)
    at addChunk (_stream_readable.js:284:12)
    at readableAddChunk (_stream_readable.js:265:11)
    at Protocol.Readable.push (_stream_readable.js:220:10)
    at Protocol.Transform.push (_stream_transform.js:151:32), err={consistencies=6, stack=Error: Operation timed out - received only 1 responses.
    at ResponseError.DriverError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/errors.js:14:19)
    at new ResponseError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/errors.js:51:24)
    at FrameReader.readError (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/readers.js:317:13)
    at Parser.parseBody (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/streams.js:194:66)
    at Parser._transform (/srv/deployment/analytics/aqs/deploy-cache/revs/95d6432bab3f85ff30a70cc776ee5dd06f478340/node_modules/cassandra-driver/lib/streams.js:137:10)
    at Parser.Transform._read (_stream_transform.js:190:10)
    at Parser.Transform._write (_stream_transform.js:178:12)
    at doWrite (_stream_writable.js:410:12)
    at writeOrBuffer (_stream_writable.js:394:5)
    at Parser.Writable.write (_stream_writable.js:294:11)
    at Protocol.ondata (_stream_readable.js:667:20)
    at Protocol.emit (events.js:189:13)
    at addChunk (_stream_readable.js:284:12)
    at readableAddChunk (_stream_readable.js:265:11)
    at Protocol.Readable.push (_stream_readable.js:220:10)
    at Protocol.Transform.push (_stream_transform.js:151:32), coordinator=10.64.16.78:9042, code=4608, isDataPresent=1, query=select \"project\",\"article\",\"granularity\",\"timestamp\",\"aa\",\"ab\",\"as\",\"au\",\"da\",\"db\",\"ds\",\"du\",\"maa\",\"mab\",\"mas\",\"mau\",\"mwa\",\"mwb\",\"mws\",\"mwu\",\"_domain\",\"_del\",\"_tid\" from \"local_group_default_T_pageviews_per_article_flat\".\"data\" where \"project\" = ? AND \"article\" = ? AND \"granularity\" = ? AND \"timestamp\" >= ? AND \"timestamp\" <= ? AND \"_domain\" = ?, name=ResponseError, received=1, message=Operation timed out - received only 1 responses., blockFor=2, info=Represents an error message from the server}, title=Error in Cassandra table storage backend, type=query_error, req={headers={}, body={\"table\":\"pageviews.per.article.flat\",\"attributes\":{\"project\":\"es.wikipedia\",\"article\":\"Primera_Guerra_Mundial\",\"granularity\":\"daily\",\"timestamp\":{\"between\":[\"2020090700\",\"2020091100\"]},\"_domain\":\"ana, uri=/analytics.wikimedia.org/sys/table/pageviews.per.article.flat/}}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:792"}}}}}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Sep 15, 12:59 PM
herron added a subscriber: herron.Tue, Sep 15, 3:58 PM
mforns triaged this task as Medium priority.Thu, Sep 17, 4:00 PM
mforns moved this task from Operational Excellence to Ops Week on the Analytics board.