Page MenuHomePhabricator

Investigate intermittent replica lag alarms
Closed, DeclinedPublic

Description

Intermittent replica lag alarms are showing up, this time on db1106:
https://phab.wmfusercontent.org/file/data/blhuxhn5dovevdtctxew/PHID-FILE-35lgnomig7rvzfmdw2qj/Screenshot_from_2021-02-11_12-37-48.png

11:45 AM <jynus> I've seen in the past show slave status returning something like max_int before
11:45 AM <jynus> se we could patch to have a ceiling (e.g. 50 years of lag) and in the future move to pt-heartbeat

Last 2 noticed (soft alert) occurrences (as they are soft alerts, many others could be missed):

  • 2021-02-10 18:40 db2121 MariaDB sustained replica lag CRITICAL 6.266e+05 ge 2
  • 2021-02-11 11:36 db1106 MariaDB sustained replica lag CRITICAL 4.169e+05 ge 2

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
DeclinedLSobanski

Event Timeline

LSobanski reopened this task as Open.
LSobanski claimed this task.
LSobanski removed LSobanski as the assignee of this task.
LSobanski triaged this task as Low priority.
LSobanski moved this task from Triage to Refine on the DBA board.

Having seen this in a while and it's not super clear what the original problem was. Resolving.