Page MenuHomePhabricator

eventgate-analytics has stopped producing events since 2025-06-25
Closed, ResolvedPublicPRODUCTION ERROR

Description

placeholder task

See the service grafana dashboard

The eventgage-analytics service has been deployed on eqiad around that time.
Currently the logs show that inbound events are discarded because of empty payloads:

{"name":"eventgate-analytics","hostname":"eventgate-production-6c77fd6585-2xztd","pid":1,"level":"WARN","levelPath":"warn/events","request_id":"<REDACTED>","request":{"url":"/v1/events?hasty=true","headers":{"user-agent":"<REDACTED>","content-length":"496","content-type":"application/x-www-form-urlencoded","x-request-id":"<REDACTED>"},"method":"POST","params":{"0":"/v1/events"},"query":{"hasty":"true"},"remoteAddress":"127.0.0.1","remotePort":52394},"msg":"Request body was empty. Must provide JSON encoded events.","time":"2025-06-30T10:46:19.543Z","v":0}

Event Timeline

BTullis renamed this task from eventgage-analytics has stopped producing events scine 2025-06-25 to eventgage-analytics has stopped producing events since 2025-06-25.Jun 30 2025, 10:51 AM

We traced back the issue to a config change that inadvertently disabled MediaWiki logging.
The change was reverted and event rate is back to pre-incident levels.

FWIW: eventgate-analytics is registered with alertmanager, but we miss rules for sudden traffic changes.
Such an alert would have helped us discover this issue early on. We can address that in a dedicated phab.

gmodena renamed this task from eventgage-analytics has stopped producing events since 2025-06-25 to eventgate-analytics has stopped producing events since 2025-06-25.Jun 30 2025, 3:07 PM
gmodena claimed this task.
gmodena triaged this task as Unbreak Now! priority.

Unfortunately we won't be able to backfill lost events for the following streams (and downstream hive datasets):

  • api-gateway.request
  • mediawiki.api-request
  • mediawiki.cirrussearch-request
  • '/^swift\.(.+\.)?upload-complete$/'

wdqs and wcqs streams were not affected by this incident.