Page MenuHomePhabricator

Design a method for keeping user-created tables in sync across labsDBs
Closed, DeclinedPublic

Description

Goal: Have transparent load balancing between LabsDB modes.

Problem: User-created tables pose a problem. Replication works for MediaWiki tables on a common channel, but user-created tables on one labs node will not necessarily be copied to another labs node.

Worse problem: User-created tables receive UPDATEs that will also need to be applied across databases.

Proposed solutions

  • Name description

Event Timeline

Hi,

By no means what I am going to say is the way to go, but just expressing my opinion and showing a method we are doing in production. Which has its drawbacks as we will see later.
The way we are importing tables from production into the labs servers is by using transportable tablespaces (https://wikitech.wikimedia.org/wiki/MariaDB/ImportTableSpace). I assume this is only for user generated tables, which are directly generated on the labsdb hosts without going through any replication channel.

While in MyISAM is relatively easy to move tables across different physical hosts, with InnoDB (our default) isn't that easy and requires a lot more work and processes.
Moving tablespaces is something somewhat new with InnoDB, which allows you to export and import tables from different hosts (this is a quite long ticket where we first started to test it: T146261#2716936 from that comment we start testing it.). This is probably a good method when you just want to import the table once, as there is no way you can do on an incremental way.

This method wouldn't allow any user to have their tables on sync on a real time level as it requires lots of human interaction (we are working towards some automation for this, but it will take months).
This way of importing tables requires to LOCK the table (meaning the user wouldn't be able to write to the table that is being imported) but at least doesn't mean we have to stop the whole server (which is obviously a NO-GO for this).
There is something also that needs to be kept in mind, and that it is that the only moment when both tables would have the same data would be exactly that one, as soon as the LOCK is released and the user starts to write to the "original" or "source" one, the table we just exported would be out of date.

This is going to happen with any way of exporting/importing the table (mysqldump or similar is another way of dealing this kind of challenge), for as long as the user only inserts on one table and we are copying it on an asynchronous way.

As I just mentioned, we can also think about mysqldump, which would have exactly the same issues (manual intervention, table LOCK in most cases) and some worse ones, loading big tables can slow down the server.
Which mysqldump though, it would be possible (not meaning easy) to maybe try to use incremental chunks and not needing to copy the whole table (which is what you do when importing/exporting tablespaces) which can potentially lead to long transfer times if the tables are big.

I think we should stop allowing write access entirely on the wiki replica servers by Cloud Services users. I know this will cause some currently unspecified number of tools to need software updates, but it is certainly not a majority use case for Wiki Replica data processing. There is no obvious solution for replicating client generated data among the three backend servers. Providing a loophole for unreplicated data with a "buyer beware" disclaimer from Cloud Services will not stop tools from breaking and people being mad when a failover is done and the data they wanted is not on the new target server. Allowing any writes to these servers outside of replication from authoritative sources also adds complexity or outright blocks dynamic load balancing of client sessions.

I do think there is a valid use case for 'curated' data sets that are hosted alongside the sanitized production data. I think we should investigate reliable methods for loading and replicating such data across the cluster (T173511). This process should allow for curated data 'owned' by both Foundation staff and volunteers with no prejudicial distinction. I do not know what percentage of user created data hosted on labsdb100[13] would be able to comply with whatever restrictions need to be met to ensure replicable data.

As the person who filed this task, I agree 100% with @bd808's assessment. Shall we close this "declined"?

@Halfak +1. It was great to get this all on-task though as we will no doubt reference it in the future.

I agree with @bd808 too - let's continue the discussion on T173511.

Per rationale explained in T156869#3559945, this feature request is being denied. Write access will not be granted to end-user accounts on the database cluster behind wikireplica-web.eqiad.wmnet and wikireplica-analytics.eqiad.wmnet.

This means that a deprecation announcement of this feature needs to be added to the process of rolling out this cluster as the only way to access the Wiki Replica data. It would be nice to have at least a rough idea of the process for 'curated' data loading (T173511) to go with that announcement.