Database replication problems - production and labs (tracking)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Petrb
	May 29 2013, 8:29 AM

Description

This is a tracking task to monitor replication problems in the WMF infrastructure, such as:

Replication broken or stopped to any server
Data or schema differences between a master and some or all of its slaves
Constant or intermittent replication lag degrading the service

This tasks are normally handled by DBA team (part of SRE), requiring many times assistance from Performance-Team, Analytics, Cloud-Services, and the many Product teams.

NOTE: If the problem you are experiencing is about Wiki Replica databases in Cloud-Services (*.{analytics,web}.db.svc.eqiad.wmflabs, *.labsdb), use the Data-Services tag instead; Wiki Replica hosts have their own set of issues including sanitization and multiple user account handling, so even if it is a replica service, the issue may not be replication itself.

Details

Reference: bz48930

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Resolved	jcrespo	T50930 Database replication problems - production and labs (tracking)
Resolved	• Springle	T51872 max_user_connections too low on Wikimedia Labs
Declined	coren	T55668 Some replicated databases are missing tables
Resolved	coren	T50626 Provide wiki metadata in the databases similar to toolserver.wiki
Declined	jcrespo	T50628 Provide replication lag as a database function
Duplicate	None	T50694 Show replication lags in Graphite
Resolved	coren	T50851 Database inserts are slow at the replicated databases
Resolved	RyanLane	T57929 Separate replica datacenter causes high query latency
Resolved	• chasemp	T52422 Replicate the Phabricator database to labsdb
Resolved	coren	T50890 no read access to s3 and s6 replication databases
Resolved	coren	T50899 Table "globalimagelinks" is missing from replicated commons database
Resolved	None	T50897 wikidatawiki_p missing wb_* tables
Resolved	coren	T54370 Several replicated DB are missing tables and content
Resolved	coren	T51046 Table 'wikidatawiki_p.recentchanges' doesn't exist
Invalid	coren	T51069 Grant read privileges to all users on databases that end with "_p"
Resolved	coren	T56164 Expose revision.rev_content_format on replicated wikidatawiki
Resolved	• Springle	T51088 Make archive table partially accessible on Wikimedia Labs
Declined	coren	T51167 Table "namespaces" only exists in enwiki_p
Resolved	coren	T59491 Make betafeatures_user_counts table available
Declined	None	T51366 MariaDB lacks help
Stalled	None	T59617 Make watchlist table available as curated foo_p.watchlist_count on labsdb
Resolved	• chasemp	T138450 maintain-replicas.pl unmaintained, unmaintainable
Resolved	coren	T61683 Add some of the missing tables in commonswiki_f_p
Duplicate	None	T68533 centralauth_p is missing tables
Declined	coren	T68786 Rename revision_userindex to revision
Resolved	None	T70356 Replicate centralauth.renameuser_status table to labs
Resolved	None	T63813 filearchive table not available on labs
Resolved	coren	T70505 view globalimagelinks missing at db commonswiki_p on new mariadb 10 sql server
Resolved	• Springle	T74226 Missing page revisions on enwiki
Resolved	coren	T60802 Document how to use federated commonswiki and wikidatawiki databases
Invalid	• Springle	T72711 missing database entries at categorylinks table on dewiki db
Resolved	• Springle	T70918 deletion queries joined with tokudb replication tables are really slow
Resolved	• Springle	T66154 Replication for enwiki has stopped
Declined	coren	T71077 Database access to spam-blacklist log
Resolved	coren	T69602 Performance problem on database server s5 using commonswiki
Resolved	• Springle	T71144 mariadb10 s2/s4/s5 unreachable
Resolved	• Springle	T73176 Discrepancy between enwiki_p.pagelinks on labs and production
Resolved	coren	T71679 Some users can't connect to replica DB servers
Declined	coren	T71776 dewiki / wikidatawiki replication (s5) has stopped on labsdb1001/labsdb1002
Resolved	coren	T75493 tables views missing for s5 databases (dewiki_p/wikidatawiki_p) on s3.labsdb
Resolved	None	T88183 Replag on labsdb
Resolved	coren	T75975 tables views missing for idwiki_p (source s3) on s3.labsdb
Resolved	jcrespo	T106470 Tool Labs enwiki_p replicated database missing rows
Resolved	jcrespo	T111371 Potential templatelinks data integrity issue on Tool Labs' enwiki_p
Declined	jcrespo	T115207 Lots of rows are missing from enwiki_p.`revision`
Duplicate	None	T118095 Missing rows in revision table of enwiki.labsdb (data integrity issue)
Resolved	jcrespo	T119841 labs db inconsistent data
Declined	None	T119847 Replicate ContentTranslation databases on Labs
Resolved	• Marostegui	T132837 hitcounter and _counter tables are on the cluster but were deleted/unsused?
Declined	None	T132838 Certain wiki databases missing from replicas?
Duplicate	None	T137641 CentralNotice tables should be available on labs
Resolved	jcrespo	T138967 Labs database replica drift
Resolved	jcrespo	T143934 s2 replag currently 8 hours
Declined	None	T143955 Replicate editor_month table from analytics-store to Labs
Resolved	• srodlund	T85868 Document labsdb replication set up
Declined	None	T89548 labswiki isn't replicated on Labs
Resolved	jcrespo	T108032 Replication issue with Fa WP replica
Resolved	• chasemp	T126096 Replicate wikimania2017wiki to labs
Declined	jcrespo	T129432 Lost database changes on s2 for 3 hours on labs replicas
Invalid	None	T133469 Discrepancy between labsdb replicas of arwiki_p.user_groups
Resolved	jcrespo	T115517 Data missing from June 11/12 on s3.labsdb
Resolved	jcrespo	T133715 Missing data on labs replica database
Resolved	jcrespo	T136618 Wrong page title in labs database replica enwiki page table
Resolved	jcrespo	T134203 enwiki_p replica on s1 is corrupted
Resolved	• chasemp	T142223 Enable access to Wikipedia Tulu (tcywiki) on labs replicas
		Restricted Task
		· · ·

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

jcrespo subscribed.Jun 24 2015, 1:19 PM

jcrespo closed subtask T72711: missing database entries at categorylinks table on dewiki db as Invalid.Jun 26 2015, 8:38 AM

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJun 26 2015, 8:38 AM

jcrespo closed subtask T73176: Discrepancy between enwiki_p.pagelinks on labs and production as Resolved.Jun 26 2015, 8:45 AM

jcrespo moved this task from Triage to Backlog on the DBA board.Jul 7 2015, 5:27 PM

zhuyifei1999 moved this task from Triage to Tracking on the Cloud-Services board.Jul 16 2015, 6:01 AM

Ricordisamoa subscribed.Jul 16 2015, 9:57 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 16 2015, 9:57 PM

jcrespo added a subtask: T106470: Tool Labs enwiki_p replicated database missing rows.Jul 22 2015, 1:04 PM

jcrespo closed subtask T106470: Tool Labs enwiki_p replicated database missing rows as Resolved.Jul 29 2015, 7:14 PM

jcrespo mentioned this in T106470: Tool Labs enwiki_p replicated database missing rows.

• MZMcBride added a subtask: T111371: Potential templatelinks data integrity issue on Tool Labs' enwiki_p.Sep 3 2015, 9:07 PM

jcrespo closed subtask T111371: Potential templatelinks data integrity issue on Tool Labs' enwiki_p as Resolved.Sep 7 2015, 7:24 PM

Glaisher added a subtask: T113842: `pr_index` to be replicated to Labs public databases.Sep 26 2015, 4:11 PM

Glaisher added a subtask: T115207: Lots of rows are missing from enwiki_p.`revision`.Oct 11 2015, 12:17 PM

• chasemp closed subtask T52422: Replicate the Phabricator database to labsdb as Resolved.Oct 22 2015, 6:07 PM

intracer subscribed.Oct 24 2015, 5:12 PM

• MZMcBride added a subtask: T118095: Missing rows in revision table of enwiki.labsdb (data integrity issue).Nov 7 2015, 10:01 PM

jcrespo added a subtask: T119841: labs db inconsistent data.Nov 29 2015, 10:53 PM

Peachey88 removed a parent task: T87716: Missing rows from categorylinks on production servers (dewiki).Nov 30 2015, 11:16 AM

Peachey88 added a subtask: T87716: Missing rows from categorylinks on production servers (dewiki).

jcrespo reopened subtask T119841: labs db inconsistent data as Open.Nov 30 2015, 11:18 AM

jcrespo removed a subtask: T87716: Missing rows from categorylinks on production servers (dewiki).

Ricordisamoa added a subtask: T119847: Replicate ContentTranslation databases on Labs.Nov 30 2015, 11:27 AM

jcrespo closed subtask T119841: labs db inconsistent data as Resolved.Dec 1 2015, 10:25 AM

Zdzislaw reopened subtask T119841: labs db inconsistent data as Open.Dec 1 2015, 6:04 PM

jcrespo closed subtask T119841: labs db inconsistent data as Resolved.Dec 11 2015, 11:56 AM

Dispenser reopened subtask T70876: Make user_email_authenticated status visible on labs as Open.Apr 4 2016, 3:54 PM

jcrespo added a subtask: T132837: hitcounter and _counter tables are on the cluster but were deleted/unsused?.Apr 16 2016, 11:12 AM

Peachey88 unsubscribed.Apr 16 2016, 11:19 AM

Volans added a subtask: T132838: Certain wiki databases missing from replicas?.Apr 16 2016, 11:20 AM

jcrespo closed subtask T50628: Provide replication lag as a database function as Resolved.Apr 22 2016, 1:17 PM

scfc changed the status of subtask T50628: Provide replication lag as a database function from Resolved to Declined.Apr 24 2016, 3:42 AM

Danny_B renamed this task from (Tracking) Database replication services to Database replication services (tracking).May 27 2016, 6:01 PM

Danny_B removed a subscriber: • wikibugs-l-list.

valhallasw added a subtask: T137641: CentralNotice tables should be available on labs.Jun 11 2016, 5:42 PM

jcrespo added a subtask: T138967: Labs database replica drift.Jun 29 2016, 5:33 PM

• Phabricator_maintenance removed a parent task: T4007: [DO NOT USE] Tracking bug [superseded by #Tracking].Jul 28 2016, 2:33 AM

zhuyifei1999 reopened subtask T50875: Unable to explain queries on replicated databases as Open.Jul 28 2016, 6:27 PM

• AlexMonk-WMF added a subtask: T143934: s2 replag currently 8 hours.Aug 25 2016, 9:16 PM

jcrespo closed subtask T143934: s2 replag currently 8 hours as Resolved.Aug 26 2016, 7:39 AM

• AlexMonk-WMF added subtasks: T143955: Replicate editor_month table from analytics-store to Labs, T85868: Document labsdb replication set up, T138450: maintain-replicas.pl unmaintained, unmaintainable, T56703: Create web version of metainformation table on replica servers, T57455: Provide dynamic report of differences between replica databases and production databases, T89548: labswiki isn't replicated on Labs, T101631: rev_len should be available also for deleted revisions in database replicas, T108032: Replication issue with Fa WP replica, T126096: Replicate wikimania2017wiki to labs, T129432: Lost database changes on s2 for 3 hours on labs replicas, T133469: Discrepancy between labsdb replicas of arwiki_p.user_groups, T115517: Data missing from June 11/12 on s3.labsdb, T133715: Missing data on labs replica database, T136618: Wrong page title in labs database replica enwiki page table, T134203: enwiki_p replica on s1 is corrupted, T135405: Replicate CentralNotice tables to Labs, T139289: Document all tables missing in replicas, T140609: Add page_props.pp_value index to Wiki Replicas, T142223: Enable access to Wikipedia Tulu (tcywiki) on labs replicas.Aug 26 2016, 8:10 AM

jcrespo changed the status of subtask T143955: Replicate editor_month table from analytics-store to Labs from Open to Stalled.Aug 31 2016, 12:43 PM

nshahquinn-wmf closed subtask T143955: Replicate editor_month table from analytics-store to Labs as Declined.Sep 1 2016, 8:34 PM

jcrespo closed subtask T132837: hitcounter and _counter tables are on the cluster but were deleted/unsused? as Resolved.Oct 17 2016, 8:55 AM

• chasemp closed subtask T138450: maintain-replicas.pl unmaintained, unmaintainable as Resolved.Oct 18 2016, 5:40 PM

• chasemp closed subtask T142223: Enable access to Wikipedia Tulu (tcywiki) on labs replicas as Resolved.Nov 2 2016, 8:02 PM

• chasemp closed subtask T126096: Replicate wikimania2017wiki to labs as Resolved.

• AlexMonk-WMF added a subtask: T148561: Replicate ores_classification and ores_model tables in labs.Nov 10 2016, 7:33 PM

jcrespo removed a subtask: T148561: Replicate ores_classification and ores_model tables in labs.Nov 10 2016, 7:35 PM

• AlexMonk-WMF added a subtask: T148561: Replicate ores_classification and ores_model tables in labs.Nov 10 2016, 8:01 PM

mark mentioned this in T148561: Replicate ores_classification and ores_model tables in labs.Nov 15 2016, 4:11 PM

mark removed a subtask: T148561: Replicate ores_classification and ores_model tables in labs.

jcrespo renamed this task from Database replication services (tracking) to Database replication services - production and labs (tracking).Nov 15 2016, 4:38 PM

jcrespo removed a project: Wikimedia-Labs-General.

jcrespo updated the task description. (Show Details)

jcrespo mentioned this in T150767: Wikireplica service for tools and labs - issues and missing available views (tracking).Nov 15 2016, 4:45 PM

jcrespo updated the task description. (Show Details)Nov 15 2016, 4:49 PM

jcrespo renamed this task from Database replication services - production and labs (tracking) to Database replication problems - production and labs (tracking).Nov 15 2016, 4:56 PM

jcrespo removed a subtask: T50875: Unable to explain queries on replicated databases.Nov 15 2016, 5:23 PM

jcrespo removed a subtask: T70876: Make user_email_authenticated status visible on labs.Nov 15 2016, 5:25 PM

jcrespo removed a subtask: T71088: Queries of commonswiki_p.filearchive for fa_sha1 are slow.

jcrespo removed a subtask: T56703: Create web version of metainformation table on replica servers.Nov 15 2016, 5:32 PM

jcrespo removed a subtask: T57455: Provide dynamic report of differences between replica databases and production databases.

jcrespo removed a subtask: T101631: rev_len should be available also for deleted revisions in database replicas.Nov 15 2016, 5:36 PM

jcrespo removed a subtask: T135405: Replicate CentralNotice tables to Labs.Nov 15 2016, 5:39 PM

jcrespo removed a subtask: T139289: Document all tables missing in replicas.

jcrespo removed a subtask: T140609: Add page_props.pp_value index to Wiki Replicas.Nov 15 2016, 5:41 PM

jcrespo added a subtask: Restricted Task.Nov 15 2016, 5:43 PM

jcrespo moved this task from Backlog to Meta/Epic on the DBA board.Nov 15 2016, 7:04 PM

jcrespo mentioned this in T151752: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org.Nov 28 2016, 3:33 PM

jcrespo closed subtask T133715: Missing data on labs replica database as Resolved.Dec 21 2016, 5:17 PM

jcrespo closed subtask T115517: Data missing from June 11/12 on s3.labsdb as Resolved.Dec 21 2016, 5:30 PM

• AlexMonk-WMF added a subtask: T154355: page_lang column of the page table is not replicated to Labs.Dec 31 2016, 3:56 PM

jcrespo closed subtask T134203: enwiki_p replica on s1 is corrupted as Resolved.Jan 24 2017, 2:56 PM

jcrespo closed subtask T129432: Lost database changes on s2 for 3 hours on labs replicas as Declined.Feb 14 2017, 3:23 PM

jcrespo closed subtask T115207: Lots of rows are missing from enwiki_p.`revision` as Declined.Feb 14 2017, 3:49 PM

jcrespo closed subtask T133469: Discrepancy between labsdb replicas of arwiki_p.user_groups as Invalid.Mar 3 2017, 9:45 AM

jcrespo closed subtask T89548: labswiki isn't replicated on Labs as Declined.Mar 3 2017, 9:49 AM

jcrespo changed the status of subtask T59617: Make watchlist table available as curated foo_p.watchlist_count on labsdb from Open to Stalled.Mar 3 2017, 9:51 AM

jcrespo closed subtask T108032: Replication issue with Fa WP replica as Resolved.Mar 3 2017, 9:58 AM

jcrespo mentioned this in T113842: `pr_index` to be replicated to Labs public databases.Mar 3 2017, 10:02 AM

jcrespo changed the status of subtask T119847: Replicate ContentTranslation databases on Labs from Open to Stalled.Mar 3 2017, 10:06 AM

Tpt removed a subtask: T113842: `pr_index` to be replicated to Labs public databases.Mar 3 2017, 10:12 AM

jcrespo closed subtask T136618: Wrong page title in labs database replica enwiki page table as Resolved.Mar 3 2017, 10:16 AM

jcrespo removed a subtask: T154355: page_lang column of the page table is not replicated to Labs.

Beta16 unsubscribed.Mar 9 2017, 8:29 AM

Dispenser reopened subtask T50625: Provide namespace IDs and names in the databases similar to toolserver.namespace as Open.Mar 23 2017, 3:53 PM

jcrespo removed a subtask: T50625: Provide namespace IDs and names in the databases similar to toolserver.namespace.Mar 23 2017, 4:00 PM

bd808 closed subtask T138967: Labs database replica drift as Resolved.Oct 18 2017, 2:09 AM

bd808 updated the task description. (Show Details)Oct 18 2017, 2:22 AM

• Marostegui closed subtask T119847: Replicate ContentTranslation databases on Labs as Declined.Oct 25 2017, 1:28 PM

jcrespo closed subtask T132838: Certain wiki databases missing from replicas? as Declined.Aug 13 2018, 2:27 PM

Resolving this meta-ticket. With the introduction of ROW-based replication before filterin, no recurring issue happened. The few issues are no longer related to replication problems, but pending operational issues. Fixing as, with the current architecture, it is unlikely to have recurring data drift issues again, and even if those happened, a full data reload is now possible, making it absolutely solvable.

Liuxinyu970226 unsubscribed.Aug 13 2018, 3:22 PM

Liuxinyu970226 moved this task from Tag to Transition completed / Archived on the Tracking-Neverending board.Nov 23 2018, 12:44 PM

• Marostegui closed subtask T85868: Document labsdb replication set up as Resolved.Nov 24 2020, 3:51 PM

Database replication problems - production and labs (tracking)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Database replication problems - production and labs (tracking)
Closed, ResolvedPublic
Actions

Related Objects
Search...