Page MenuHomePhabricator

RFC: Require use of common storage abstractions (policy)
Open, LowPublic

Description

  • Affected components: MediaWiki core, skins and extensions.

Motivation

In January, TechCom members informally talked about this topic and expressed an interest to explore requiring use of existing storage abstraction layers.

This would mean that for MediaWiki, which has abstraction layers for Mysql (wikimedia/rdbms), Memcached (BagOStuff), and Redis (wikimedia/redis; RedisConnectionPool), we would by default not allow Wikimedia-deployed code in core and extensions to establish connections to this kind of service directly by other means.

This task will determine whether a policy around this is desirable and viable. If so, this RFC can result in an amendment to our development policy to that effect. (Once T190379 is resolved.)

Requirements

(TBD: Specify the features or criteria that need to be met.)


Exploration

(TBD: Use this space for data gathering, status quo, proposals, other considerations etc.)

Proposal 1

Event Timeline

Krinkle triaged this task as Low priority.
Krinkle updated the task description. (Show Details)
Krinkle added a project: Performance-Team.

I guess there is no harm in writing this down but it happens even without a policy. Why would anyone not use service abstraction layers?
RedisConnectionPool and BagOStuff are at very different levels of abstraction, though.

What about higher abstraction layers like the XxxStore classes? Do we generally expect non-legacy code to use that?

I guess there is no harm in writing this down but it happens even without a policy. Why would anyone not use service abstraction layers?
RedisConnectionPool and BagOStuff are at very different levels of abstraction, though.

What about higher abstraction layers like the XxxStore classes? Do we generally expect non-legacy code to use that?

Yes, I would very much encourage all new code to bind to the narrowest possible interface with the most concrete possible contract. Using BagOStuff directly is bad, because the different implementations have very different contracts, and most calling code does not use most of the functionality it exposes. To go even further, directly querying database tables is bad, because it spreads knowledge about the schema through the code base.

But that's beyond the scope of this proposal, and it's hard to clearly state in a way that is not ambiguous.

The point of this proposal, as far as I understand it, is to ensure that application logic does not bind to the specifics of a storage mechanism in terms of connecting and authenticating. The only way to make this simple and reliable is to enumerate the available storage mechanisms that application logic and extensions are allowed to use; if there is need to use another, an appropriate abstraction layer would have to be added first, to the code base and to the policy.

I don't think this kind of recommendation ("require use of storage abstraction layers") fits in a development policy, it seems more like a guideline.

The stuff that Daniel said about:

...encourage all new code to bind to the narrowest possible interface with the most concrete possible contract.

That seems more like something that fits in the development policy.

Though I don't really follow the part about:

Using BagOStuff directly is bad, because the different implementations have very different contracts, and most calling code does not use most of the functionality it exposes

BagOStuff is broken in so many ways (different implementations have different behavior for lookup misses, atomicity, consistency, even the allowed key space used to be different although maybe that's fixed by now) but that needs to be fixed in core, not policy-ed around. People will use what they have.

The point of this proposal, as far as I understand it, is to ensure that application logic does not bind to the specifics of a storage mechanism in terms of connecting and authenticating.

Which never happened so far, as far as I know. I imagine this was inspired by extensions using low-level abstractions (e.g. GettingStarted using RedisConnectionPool instead of BagOStuff or whatever) but that wouldn't be prevented by the amendment as stated.

Krinkle renamed this task from Development policy amendment: Require use of storage abstraction layers to Development policy: Require use of storage abstraction layers.Apr 3 2019, 11:37 PM

Nothing against this, my only suggestion would be to maybe rephrase or stress the idea of "do not duplicate abstraction layers". Fix existing ones, or replace them, but don't maintain 2 in parallel, as otherwise they are not really "abstraction". If you want to replace one, you have to demonstrate that it is absolutely required, and the deprecation and migration work is worth it.

Krinkle renamed this task from Development policy: Require use of storage abstraction layers to Development policy: Require use of common storage abstractions.Aug 5 2019, 4:21 PM
Krinkle removed Krinkle as the assignee of this task.
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)

Triaging into the backlog as the initial stage is to gain more information from stakeholders on whether this is desirable and what the motivations for doing so would be (Define a problem statement and objective).

Does it cover schema and schema changes? We are currently at middle of migrating schema and schema changes in core (T191231) but I doubt extensions be able to pick that up any time soon, making them use the abstract schema right now doesn't make much sense to me but it can be part of the policy in two full releases (for example)
Also, is schema and schema change needs to follow the same abstractions as others? I'm working on a Doctrine-DBAL-based solution because mediawiki's DBAL doesn't give me "ALTER TABLE" abstractions.

I think what would be desirable, from the point of view of a developer, to know they can use some specific abstractions, and which ones they should use.

So for example:

  • If you need a relational datastore which supports ACID transactions and SQL syntax, the right interface to do so in MediaWiki is Foo
  • If you need to be able to write key-value data and you don't need persistence or strict consistency, you should use interface Bar

These interfaces should:

  • clearly offer specific guarantees in terms of consistency, availability, durability and multi-master write ability (among other things)
  • not be tied to a specific technology (so, as @Tgr correctly noticed, I don't think RedisConnectionPool qualifies)

This would allow the developers to write code being sure of what promises they're expecting, and the actual implementation of the interface can change technology in a deployment without huge issues.

I think the main question we need to answer first is - what classes of storage patterns are currently supported by MediaWiki? How do we map the current abstractions (at any level) to such patterns?

Krinkle renamed this task from Development policy: Require use of common storage abstractions to RFC: Development policy to require use of common storage abstractions.Apr 4 2020, 2:46 AM
Krinkle renamed this task from RFC: Development policy to require use of common storage abstractions to RFC: Require use of common storage abstractions (policy).May 3 2020, 11:19 PM
Krinkle updated the task description. (Show Details)

This task has no active project tags apart from one "radar" tag, so it is lingering. Should this remain open? If yes, please add an active project tag - thanks!