⚓ T268336 Cleanup heartbeat.heartbeat on all production instances

	Subject	Repo	Branch	Lines +/-
	orchestrator: Use heartbeat table to detect lag.	operations/puppet	production	+1 -0

Status	Assigned	Task
Resolved	• Kormat	T268316 Base replication lag detection on heartbeat
Resolved	Marostegui	T268336 Cleanup heartbeat.heartbeat on all production instances
Resolved	Marostegui	T273593 Clean up heartbeat table on clouddb hosts
Resolved	Marostegui	T281826 Cleanup heartbeat.heartbeat on s2
Resolved	Marostegui	T281827 Cleanup heartbeat.heartbeat on s3
Resolved	Marostegui	T281828 Cleanup heartbeat.heartbeat on s5
Resolved	Marostegui	T281829 Cleanup heartbeat.heartbeat on s6
Resolved	Marostegui	T281830 Cleanup heartbeat.heartbeat on s8

• Kormat created this task.Nov 20 2020, 1:13 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2020, 1:13 PM

LSobanski subscribed.Nov 20 2020, 1:15 PM

Let's backup them just in case- the reason why this was not done before is because the heartbeat table helps track past replication history in case a replication problem (e.g. an improper master switchover).

I have created a dump at: dbprov1003:/srv/backups/dumps/latest/heartbeat_tables.2020-11-20.tar.gz, so that is no longer a blocker.

• Kormat added a parent task: T268316: Base replication lag detection on heartbeat.Nov 20 2020, 1:39 PM

• Kormat mentioned this in T268316: Base replication lag detection on heartbeat.

• Kormat updated the task description. (Show Details)Nov 20 2020, 1:43 PM

herron triaged this task as Medium priority.Nov 20 2020, 3:07 PM

• Kormat updated the task description. (Show Details)Nov 20 2020, 3:12 PM

Mentioned in SAL (#wikimedia-operations) [2020-11-23T14:04:48Z] <kormat> cleaning up heartbeat.heartbeat on pc1 T268336

Mentioned in SAL (#wikimedia-operations) [2020-11-23T14:09:14Z] <kormat> cleaning up heartbeat.heartbeat on pc2 T268336

Mentioned in SAL (#wikimedia-operations) [2020-11-23T14:10:02Z] <kormat> cleaning up heartbeat.heartbeat on pc3 T268336

Change 642379 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] orchestrator: Use heartbeat table to detect lag.

https://gerrit.wikimedia.org/r/642379

gerritbot added a project: Patch-For-Review.Nov 23 2020, 2:18 PM

Cleaning up the heartbeat tables in prod is a bit tricky, as there's a lot of cruft, and a mix of STATEMENT vs ROW replication. My suggestion is that we update stale rows to set ts to 0, instead of trying to delete the rows. That makes it easy to filter them out from queries.

LSobanski moved this task from Triage to Refine on the DBA board.Nov 24 2020, 12:48 PM

Change 642379 merged by Kormat:
[operations/puppet@production] orchestrator: Use heartbeat table to detect lag.

https://gerrit.wikimedia.org/r/642379

Marostegui edited projects, added Orchestrator; removed SRE.Nov 25 2020, 9:54 AM

Maintenance_bot removed a project: Patch-For-Review.Nov 25 2020, 10:10 AM

m1 table cleaned up

Marostegui mentioned this in T272568: Add m* and es4/es5 sections to Orchestrator.Jan 21 2021, 9:32 AM

x2 cleaned.

m3 cleaned.

m5 cleaned

es5 cleaned

Marostegui updated the task description. (Show Details)Jan 28 2021, 11:18 AM

Marostegui moved this task from Refine to In progress on the DBA board.Feb 1 2021, 9:37 PM

m2 cleaned

Marostegui updated the task description. (Show Details)Feb 1 2021, 9:42 PM

es4 cleaned

After almost 2h, I have entirely cleaned up s4 :)

Marostegui updated the task description. (Show Details)Feb 2 2021, 9:13 AM

Marostegui closed subtask T273593: Clean up heartbeat table on clouddb hosts as Resolved.Feb 3 2021, 11:35 AM

x1 cleaned

Marostegui updated the task description. (Show Details)Feb 5 2021, 12:43 PM

s7 heartbeat cleaned

Marostegui updated the task description. (Show Details)Mar 23 2021, 8:04 AM

I have started to clean up s1, but will most likely finish it once the switchover is done tomorrow. There's lots to clean up there

Regarding s1: I have finished cleaning up eqiad. Tomorrow only deleting the current master's ID would be the only thing left for it.
Going to start codfw clean up now.