This task looks at tackling 2 issues currently found in Wikibase, mainly exposed on Wikidata.
- It is easy to incorrectly wire connections and load balancers when different db clusters are used. See T281457
- We wan't a way to easily give all db connections from client or repo a "group" indicating where they come from (client / repo) See T262924#6498193
An abstraction should allow us to solve these 2 problems in a single place.
It would also be a first step in creating an abstraction between MediaWiki and Wikibase (though interfaces presented through such an abstraction would still be used (from the RDBMS lib package in core))
Wikibase often needs to connect to multiple databases at once, most commonly the repository database and a client database.
These databases have different domains (db names) but can also live on different clusters, resulting in different LoadBalancer objects being needed.
The pattern that results out of this is that LBFactory objects, LoadBalancer objects and string dbnames are passed around between our Wikibase services.
We want to avoid this as this pattern leads to mistakes.
During changes on at least 2 occurrences in the past 12 months code has been merged with issues using the correct LoadBalancer objects.
This most recently happened with T281457: Several Wikibase services try to read from local domain, when they mean to access the repo.
TBA find the other ticket
See https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/602823 for an intial draft in this direction.
The idea being that we come up with a single service for db interaction (or services (repo & client)) within the Wikibase code to use within our services.
An added benefit here is an abstraction for a wait for replication on a LBFactory also lives here.
In order for DBAs to be able to segment database traffic more effectively, we want to provide additional information when we request connections about the type of site / code the connection is being requested for.
This would, allow DBAs to allocate separate hardware dedicated to keeping client (Wikipedia) functionality online, even if database servers dedicated to repo functionality are overloaded for some other unrelated reason.
Proposed initial database groups:
- When client code gets any wikibase related database connection a from-client group is used.
- When repo code gets any wikibase related database connection a from-repo group is used.
Note: the draft gerrit patch above does not include any of this grouping code.
- (1) An wikibase abstraction exists for database connections to repo and client databases (hiding the troubling patterns above)
- (1.1) This abstraction is the only way database connections are acquired in wikibase code
- (2) All wikibase related database connections have a "from" group that is passed to MediaWiki
Notes from storytime:
- Jakob: I also still think we should ponder splitting the abstraction into concrete repo db connections and client db connections, so we can type hint for them
- Tom: Perhaps this this live not in lib , but in our packages ADR 14