Page MenuHomePhabricator

db2189 replication broken
Closed, ResolvedPublic

Description

Last_SQL_Error: Error 'Index for table 'recentchanges' is corrupt; try to repair it' on query. Default database: 'cswiki'. Query: 'INSERT /* RecentChange::save  */ INTO `recentchanges` (rc_type,rc_minor,rc_bot,rc_patrolled,rc_params,rc_timestamp,rc_logid,rc_log_type,rc_log_action,rc_source,rc_deleted,rc_new,rc_namespace,rc_title,rc_old_len,rc_new_len,rc_this_oldid,rc_last_oldid,rc_cur_id,rc_comment_id,rc_actor) VALUES (5,1,0,2,'a:1:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:1789994467;s:4:\"time\";s:14:\"20250120130823\";s:7:\"user_id\";s:7:\"5333695\";s:11:\"revision_id\";s:10:\"2300346769\";s:9:\"object_id\";s:10:\"Q124154946\";s:4:\"type\";s:20:\"wikibase-item~update\";s:11:\"entity_type\";s:4:\"item\";s:7:\"page_id\";i:118328611;s:6:\"rev_id\";i:2300346769;s:9:\"parent_id\";i:2299314452;s:7:\"comment\";s:73:\"/* wbsetsitelink-add:1|viwiki */ Giải quần vợt Úc Mở rộng 2025\";s:9:\"user_text\";s:29:\"Δάφνινο στεφάνι\";s:15:\"central_user_id\";i:68104204;s:3:\"bot\";i:0;}}','20250120130823',0

Details

Related Changes in Gerrit:

Event Timeline

Icinga downtime and Alertmanager silence (ID=6bc869af-7b69-41be-b6a5-8b22a34846d0) set by marostegui@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: rebuilding index

db2189.codfw.wmnet
Marostegui triaged this task as Medium priority.Jan 20 2025, 1:18 PM
Marostegui moved this task from Triage to In progress on the DBA board.

Change #1112753 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2189: Disable notications

https://gerrit.wikimedia.org/r/1112753

Change #1112753 merged by Marostegui:

[operations/puppet@production] db2189: Disable notications

https://gerrit.wikimedia.org/r/1112753

Icinga downtime and Alertmanager silence (ID=1d6a00d0-5764-45e0-a373-38ce9a52fcc3) set by marostegui@cumin2002 for 12:00:00 on 1 host(s) and their services with reason: Rebuild and upgrade db2189

db2189.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=674fdf0e-fa63-44ed-bd04-eb70bf2b6ca2) set by marostegui@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: rebuilding index

db2189.codfw.wmnet

Host updated. Index rebuilt using db-mysql db2189 -e "stop slave;set session sql_log_bin=0; alter table recentchanges engine=innodb,force;start slave;" cswiki

Host updated. Index rebuilt using db-mysql db2189 -e "stop slave;set session sql_log_bin=0; alter table recentchanges engine=innodb,force;start slave;" cswiki

Thank you! Let's also rebuild all the other tables across all the other wikis!

The index rebuild was successful. Currently running rebuild for all wiki, and it's taking a long time. The host is also catching up with the replica log, currently at around 16 hours.

The rebuilds completed minutes ago. The log replica is down from 23h to 10h approx.

Start pool of db2189 slowly with 10 steps - Repool host after fixing indexes and performing OS updates - fceratto@cumin1002

Completed pool of db2189 slowly with 10 steps - Repool host after fixing indexes and performing OS updates - fceratto@cumin1002

Icinga downtime and Alertmanager silence (ID=f46a9268-2e9b-44bc-bf86-38258ed9cc3e) set by marostegui@cumin2002 for 12:00:00 on 1 host(s) and their services with reason: Rebuild and upgrade db1166

db1154.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=14d674ab-cb5e-467b-91e3-5bd2171ded5e) set by marostegui@cumin2002 for 12:00:00 on 2 host(s) and their services with reason: Rebuild and upgrade db1166

clouddb[1016,1020].eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=5c63763a-519c-4976-a7fc-bf568d2bd6cd) set by marostegui@cumin2002 for 12:00:00 on 1 host(s) and their services with reason: Rebuild and upgrade db1166

an-redacteddb1001.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=bec3f773-4d46-4d80-9cd4-e74fa5bd9093) set by marostegui@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: Rebuild and upgrade dbstore1007:s4

dbstore1007.eqiad.wmnet