Page MenuHomePhabricator

Evaluate switch to a distributed SQL database for the Wikidatawiki cluster
Closed, DeclinedPublicFeature

Description

Feature summary (what you would like to be able to do and where):
A distributed SQL database offers multiple advantages compared to the current master/replica setup with mariadb.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Avoid the current problems with the database growing out of a single server.
See https://www.wikidata.org/wiki/User:ASarabadani_(WMF)/Growth_of_databases_of_Wikidata for a description of the problems the operations team is facing with master and replicas on the Wikidata cluster.

Benefits (why should this be implemented?): I
The whole WMF ecosystem. WMF operations team.
Ensuring scalability is absolutely crucial to the success of Wikidata. Please get to work immediately on this so we don't have to bother Wikidata users about limiting growth because of an outdated backend that doesn't scale horizontally.

Event Timeline

This comment was removed by So9q.
So9q renamed this task from Evaluate switch to a distributed SQL database for Wikidata to Evaluate switch to a distributed SQL database for the Wikidata cluster.Sep 24 2024, 8:37 AM
So9q updated the task description. (Show Details)
So9q added subscribers: Lydia_Pintscher, Ladsgroup.

Turning MW's database to a distributed one is not really technically feasible. It's basically something along the lines of "rewrite mw to golang" in terms of amount of resources needed.

Turning MW's database to a distributed one is not really technically feasible. It's basically something along the lines of "rewrite mw to golang" in terms of amount of resources needed.

Really?
Do you know about MySQL Cluster?

"MySQL NDB Cluster is the distributed database combining linear scalability and high availability. It provides in-memory real-time access with transactional consistency across partitioned and distributed datasets. It is designed for mission critical applications.

MySQL NDB Cluster has replication between clusters across multiple geographical sites built-in. A shared nothing architecture with data locality awareness make it the perfect choice for running on commodity hardware and in globally distributed cloud infrastructure." source

This seems like a very good fit for Wikidatawiki if you ask me. It's released under GPL2 according to enwiki. You can download it here and give it a try. I'm pretty sure the current tables in wikidatawiki can be easily migrated to it as mariadb and mysql are very similar. If I'm right and it is a drop-in replacement you don't have to rewrite or change a single line of code, but I could be wrong ;)
WDYT?

So9q renamed this task from Evaluate switch to a distributed SQL database for the Wikidata cluster to Evaluate switch to a distributed SQL database for the Wikidatawiki cluster.Sep 26 2024, 12:20 PM

Turning MW's database to a distributed one is not really technically feasible. It's basically something along the lines of "rewrite mw to golang" in terms of amount of resources needed.

Really?
Do you know about MySQL Cluster?

Yes, I talked to the maintainers of it during some database related conferences.

"MySQL NDB Cluster is the distributed database combining linear scalability and high availability. It provides in-memory real-time access with transactional consistency across partitioned and distributed datasets. It is designed for mission critical applications.

MySQL NDB Cluster has replication between clusters across multiple geographical sites built-in. A shared nothing architecture with data locality awareness make it the perfect choice for running on commodity hardware and in globally distributed cloud infrastructure." source

This seems like a very good fit for Wikidatawiki if you ask me. It's released under GPL2 according to enwiki. You can download it here and give it a try. I'm pretty sure the current tables in wikidatawiki can be easily migrated to it as mariadb and mysql are very similar. If I'm right and it is a drop-in replacement you don't have to rewrite or change a single line of code, but I could be wrong ;)
WDYT?

That wouldn't work. If you don't believe me, please set up a distributed cluster and run a mediawiki backed by it.