Page MenuHomePhabricator

[toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12
Closed, ResolvedPublic

Description

From alert in alertmanager (https://alerts.wikimedia.org/?q=team%3Dwmcs)

ToolsToolsDBReplicationLagIsTooHigh
project: tools
summary: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 59756
16 hours agoinstance: tools-db-2
job: toolsdb-mariadb
master_host: tools-db-1.tools.eqiad1.wikimedia.cloud
team: wmcs
@cluster: wmcloud.org

It seems to be stuck on a delete:

dcaro@urcuchillay$ wmcs-cookbooks wmcs.toolforge.toolsdb.get_current_replica_transaction --task-id T357264
Got matching cookbooks wmcs.toolforge.toolsdb.get_current_replica_transaction
START - Cookbook wmcs.toolforge.toolsdb.get_current_replica_transaction
Skipping not-active replica node tools-db-3.tools.eqiad1.wikimedia.cloud: NodeStatus(fqdn='tools-db-3.tools.eqiad1.wikimedia.cloud', nodeid='Unknown', replication_state=ReplicationState(status='Unknown'), host_status='Up', mariadb_status='Stopped(inactive-dead)')
###########################################################################
Replica: {replica_name}
Suspicious tables:
    Table_map: `s51698__yetkin`.`visited_pages_agg` mapped
Suspicious queries:
    #Q> DELETE FROM visited_pages_agg WHERE vpa_year = '2024' AND vpa_month = '2' AND vpa_day = '10'
END (PASS) - Cookbook wmcs.toolforge.toolsdb.get_current_replica_transaction (exit_code=0)