Page MenuHomePhabricator

KaiOS app client-side errors dashboard stopped working
Closed, ResolvedPublic

Description

There's a logstash dashboard showing javascript errors from the KaiOS app: https://logstash.wikimedia.org/app/dashboards#/view/AXR-MT9PMQ_08tQaBHcY

It is based on meta.stream: kaios_app.error

It appears to have stopped working recently.

Event Timeline

I had a brief look into this to check the logstash pipeline health. I can't find events in the dashboard for the last 90d, although from the sent payload I'm guessing the messages should end up in (eqiad|codfw).mediawiki.client.error topic in the kafka "logging" cluster (?).

There are very feww kaios_app.error events:

https://grafana.wikimedia.org/d/ePFPOkqiz/eventgate?orgId=1&refresh=1m&var-service=eventgate-logging-external&var-stream=kaios_app.error&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos&from=1619186133393&to=1619790933393&viewPanel=74

I don't see any validation errors for this stream either:
https://logstash.wikimedia.org/goto/52e9dd3d6f17bf2fcf8495d34e4a1f16

However, the few kaios_app.error events in the Kafka logging cluster look recent enough, I guess the volume is just pretty low.

@herron is it possible the logstash ingestion from Kafka for this stream got busted since the dedicated Kafka broker migration?

from the sent payload I'm guessing the messages should end up in (eqiad|codfw).mediawiki.client.error topic

FYI, that code sets meta.stream to 'kaios_app.error', from which the Kafka topic names are created, e.g. (eqiad|codfw).kaios_app.error

from the sent payload I'm guessing the messages should end up in (eqiad|codfw).mediawiki.client.error topic

FYI, that code sets meta.stream to 'kaios_app.error', from which the Kafka topic names are created, e.g. (eqiad|codfw).kaios_app.error

My bad! Thank you for the explanation, that makes sense. It looks like to me we're not ingesting those topics in ELK7 but should, we did in ELK5 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/593225) but never did in ELK7

Change 686803 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] logstash: collect kaios_app.error stream into logstash clienterror input

https://gerrit.wikimedia.org/r/686803

Change 686803 merged by Cwhite:

[operations/puppet@production] logstash: collect kaios_app.error stream into logstash clienterror input

https://gerrit.wikimedia.org/r/686803

Change 689986 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] logstash: collect kaios_app.error stream into logstash clienterror input

https://gerrit.wikimedia.org/r/689986

Change 689986 merged by Cwhite:

[operations/puppet@production] logstash: collect kaios_app.error stream into logstash clienterror input

https://gerrit.wikimedia.org/r/689986

colewhite claimed this task.
colewhite added a subscriber: colewhite.

Logs appear to be flowing again.