Design a method for keeping user-created tables in sync across labsDBs
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Halfak
	Jan 31 2017, 11:00 PM

Description

Goal: Have transparent load balancing between LabsDB modes.

Problem: User-created tables pose a problem. Replication works for MediaWiki tables on a common channel, but user-created tables on one labs node will not necessarily be copied to another labs node.

Worse problem: User-created tables receive UPDATEs that will also need to be applied across databases.

Proposed solutions

Name description

Related Objects
Search...

Status	Assigned	Task
Resolved	• jcrespo	T140788 Labs databases rearchitecture (tracking)
Resolved	• jcrespo	T153058 LabsDB infrastructure pending work
Declined	None	T156869 Design a method for keeping user-created tables in sync across labsDBs

Event Timeline

Halfak created this task.Jan 31 2017, 11:00 PM

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJan 31 2017, 11:00 PM

Hi,

By no means what I am going to say is the way to go, but just expressing my opinion and showing a method we are doing in production. Which has its drawbacks as we will see later.
The way we are importing tables from production into the labs servers is by using transportable tablespaces (https://wikitech.wikimedia.org/wiki/MariaDB/ImportTableSpace). I assume this is only for user generated tables, which are directly generated on the labsdb hosts without going through any replication channel.

While in MyISAM is relatively easy to move tables across different physical hosts, with InnoDB (our default) isn't that easy and requires a lot more work and processes.
Moving tablespaces is something somewhat new with InnoDB, which allows you to export and import tables from different hosts (this is a quite long ticket where we first started to test it: T146261#2716936 from that comment we start testing it.). This is probably a good method when you just want to import the table once, as there is no way you can do on an incremental way.

This method wouldn't allow any user to have their tables on sync on a real time level as it requires lots of human interaction (we are working towards some automation for this, but it will take months).
This way of importing tables requires to LOCK the table (meaning the user wouldn't be able to write to the table that is being imported) but at least doesn't mean we have to stop the whole server (which is obviously a NO-GO for this).
There is something also that needs to be kept in mind, and that it is that the only moment when both tables would have the same data would be exactly that one, as soon as the LOCK is released and the user starts to write to the "original" or "source" one, the table we just exported would be out of date.

This is going to happen with any way of exporting/importing the table (mysqldump or similar is another way of dealing this kind of challenge), for as long as the user only inserts on one table and we are copying it on an asynchronous way.

As I just mentioned, we can also think about mysqldump, which would have exactly the same issues (manual intervention, table LOCK in most cases) and some worse ones, loading big tables can slow down the server.
Which mysqldump though, it would be possible (not meaning easy) to maybe try to use incremental chunks and not needing to copy the whole table (which is what you do when importing/exporting tablespaces) which can potentially lead to long transfer times if the tables are big.

zhuyifei1999 subscribed.Feb 3 2017, 1:40 PM

• jcrespo moved this task from Triage to Meta/Epic on the DBA board.Feb 14 2017, 3:05 PM

• jcrespo mentioned this in T138967: Labs database replica drift.May 26 2017, 6:42 AM

• MZMcBride subscribed.Jun 20 2017, 10:30 PM

bd808 edited projects, added Data-Services; removed Cloud-VPS, Cloud-Services.Aug 7 2017, 3:53 PM

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.

bd808 mentioned this in T172704: Promote initial use of new Wiki Replica servers.Aug 10 2017, 3:14 PM

• jcrespo mentioned this in T173511: Implement technical details and process for "datasets_p" on wikireplica hosts.Aug 28 2017, 11:32 AM

I think we should stop allowing write access entirely on the wiki replica servers by Cloud Services users. I know this will cause some currently unspecified number of tools to need software updates, but it is certainly not a majority use case for Wiki Replica data processing. There is no obvious solution for replicating client generated data among the three backend servers. Providing a loophole for unreplicated data with a "buyer beware" disclaimer from Cloud Services will not stop tools from breaking and people being mad when a failover is done and the data they wanted is not on the new target server. Allowing any writes to these servers outside of replication from authoritative sources also adds complexity or outright blocks dynamic load balancing of client sessions.

I do think there is a valid use case for 'curated' data sets that are hosted alongside the sanitized production data. I think we should investigate reliable methods for loading and replicating such data across the cluster (T173511). This process should allow for curated data 'owned' by both Foundation staff and volunteers with no prejudicial distinction. I do not know what percentage of user created data hosted on labsdb100[13] would be able to comply with whatever restrictions need to be met to ensure replicable data.

As the person who filed this task, I agree 100% with @bd808's assessment. Shall we close this "declined"?

@Halfak +1. It was great to get this all on-task though as we will no doubt reference it in the future.

I agree with @bd808 too - let's continue the discussion on T173511.

Per rationale explained in T156869#3559945, this feature request is being denied. Write access will not be granted to end-user accounts on the database cluster behind wikireplica-web.eqiad.wmnet and wikireplica-analytics.eqiad.wmnet.

This means that a deprecation announcement of this feature needs to be added to the process of rolling out this cluster as the only way to access the Wiki Replica data. It would be nice to have at least a rough idea of the process for 'curated' data loading (T173511) to go with that announcement.

• jcrespo mentioned this in T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware.Aug 31 2017, 1:47 PM

bd808 mentioned this in Blog Post: New Wiki Replica servers ready for use.Sep 1 2017, 12:58 AM

bd808 mentioned this in T175086: Create and announce timeline for shutting down labsdb100[13].Sep 26 2017, 6:01 AM

Ricordisamoa subscribed.Sep 27 2017, 2:52 AM

bd808 mentioned this in T179628: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser.Nov 3 2017, 1:03 AM

bd808 mentioned this in T56401: Backups systems for userDBs hosted on replicas.Nov 29 2018, 1:01 AM