Page MenuHomePhabricator

Troubleshoot EL performance problems on 2015-05-06 {oryx}
Closed, ResolvedPublic

Description

EventLogging suffered from performance problems from Tuesday 2015-05-05 22:00 UTC to Wednesday 2015-05-06 20:00 UTC (22 hours).

During that period, an exceptional amount of events were sent to EL server for MobileWebSearch schema. The system could not handle them properly, and this caused data loss (30%-40% during the period) and some small gaps in the db. All schemas were affected.

Event Timeline

mforns created this task.May 8 2015, 9:39 AM
mforns claimed this task.
mforns raised the priority of this task from to Needs Triage.
mforns updated the task description. (Show Details)
mforns added a subscriber: mforns.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2015, 9:39 AM
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.May 8 2015, 9:44 AM

Change 210017 had a related patch set uploaded (by Mforns):
Further optimize sql insertion

https://gerrit.wikimedia.org/r/210017

mforns renamed this task from Troubleshoot EL performance problems on 2015-05-06 and backfill missing data to Troubleshoot EL performance problems on 2015-05-06.May 11 2015, 10:32 AM
mforns set Security to None.
mforns moved this task from In Progress to In Code Review on the Analytics-Kanban board.

Change 210017 merged by Milimetric:
Further optimize sql insertion

https://gerrit.wikimedia.org/r/210017

mforns closed this task as Resolved.May 19 2015, 3:41 PM
kevinator renamed this task from Troubleshoot EL performance problems on 2015-05-06 to Troubleshoot EL performance problems on 2015-05-06 {oryx}.May 19 2015, 7:25 PM
kevinator moved this task from In Code Review to Done on the Analytics-Kanban board.