Page MenuHomePhabricator

RFC: Abstract schemas and schema changes
Closed, ResolvedPublic

Description

At some point this should be a TechCom-RFC, but at the moment it's still in the drafting stage.

Problem

MediaWiki claims to support five databases: MySQL/MariaDB, SQLite, PostgreSQL ("PG"), Microsoft SQL Server ("MSSQL"), and Oracle Database. For normal runtime-type queries, we have abstractions that make these all mostly work pretty well.

But at the DDL level it's a completely different story. One major piece of (and source of) technical debt is the fact that MediaWiki does not have a database schema, it has four. And most schema changes have to be written five times, one for each supported database. In practice, this means schema changes for the less-supported databases are often omitted, or when not omitted are often merged without being tested.

We can improve the situation by abstracting the schema and schema change definitions, with code per database to translate that into the actual DDL statements.

Approved solution

Create a PHP interface (or base class) for schemas and schema changes. We implement this in top of Doctrine DBAL. Schemas and schema changes will be defined in JSON files conforming to https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema, which will be read into Schema or SchemaDiff objects. DBAL will then take care of generating SQL for any supported RDBMS.

In addition, database support (in the form of a subclass of the Database base class) should be made pluggable. Extensions that want to provide support for a database backend would provide a Database subclass as well as a suitable implementation of the schema and schema change interface. This would be trivial for any database that is supported by DBAL.

Notes:

  • This means we drop support for MSSQL and Oracle RDBMS from MediaWiki core, since DBAL support for them is insufficient and/or the schema for these databases has diverged from the main line schema. WMF will not continue support for these database backends. Volunteers have shown interest in bringing back support for these backends in form of extensions.
  • For schema definitions, we go with JSON for now. But the we may want to switch to YAML later, for easier editing. JSON and YAML can easily be converted into one another.
  • If someone wants to introduce a schema change, there should be a new deployable file which can contain several schema changes. Existing schema change (json) files should not be changed to perform additional changes.

Old proposals

Proposal #1

We should write a schema and schema change abstraction layer to integrate with MediaWiki's existing runtime database abstraction. Details are on-wiki at https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema and https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema/DB_Requirements, but in short:

  • We would have one schema, expressed as a structure in a JSON file. We would have one definition of each schema change, expressed as a structure in a JSON file.
  • Database-specific classes would exist to turn the schema or schema-change into SQL statements, much as we have database-specific subclasses of Wikimedia\Rdbms\Database.
  • We'd also tighten up some of the other database-level things: limited identifier lengths, index name uniqueness, data type consistency, charset consistency, etc.

The reason we didn't go with this:

  • It's lots of work to write a schema and schema change abstraction from scratch.

Proposal #2

Try to integrate Doctrine Migrations for schema creation and updates.

Pros (compared to Proposal #1):

  • We wouldn't have to implement all the database-specific logic ourself.
  • Probably a larger community fixing any bugs that exist.
  • Familiar system for (some subset of) PHP developers, simplifying onboarding of devs.

The reasons we didn't go with this:

  • We'd have to have code to translate MediaWiki's DB connection info to Doctrine's format, and otherwise translate between Doctrine conventions and MediaWiki conventions.
  • We may have to custom-implement a "mwtimestamp" type, or else standardize all DBs on using 14-byte strings.
  • We may still have to work around issues like MSSQL's different treatment of NULLs in unique indexes.

Related Objects

StatusSubtypeAssignedTask
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedNone
OpenNone
ResolvedReedy
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenDaimona
ResolvedMarostegui
ResolvedBstorm
ResolvedDaimona
ResolvedUrbanecm
DuplicateNone
OpenNone
InvalidNone
InvalidNone
ResolvedReedy
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedReedy
ResolvedReedy
InvalidNone
OpenNone
OpenNone
OpenNone
ResolvedReedy
ResolvedReedy
ResolvedReedy
InvalidNone
DeclinedNone
ResolvedReedy
InvalidNone
OpenNone
OpenNone
ResolvedPhysikerwelt
InvalidNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
InvalidNone
ResolvedReedy
OpenNone
OpenSTran
OpenNone
ResolvedTchanders
InvalidNone
InvalidNone
InvalidNone
OpenNone
ResolvedReedy
InvalidNone
ResolvedLadsgroup
InvalidNone
OpenNone
OpenNone
ResolvedNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 595240 merged by jenkins-bot:
[mediawiki/core@master] Wire empty abstract schema into installer

https://gerrit.wikimedia.org/r/595240

@Ladsgroup I'm unsure if this was already discussed or implemented, but perhaps we could add a PHPUnit test to ensure that the contents of the tables-generated.sql files match the output generated by generateSchemaSql.php. Something along the lines of AutoLoaderStructureTest. What do you think?

@Ladsgroup I'm unsure if this was already discussed or implemented, but perhaps we could add a PHPUnit test to ensure that the contents of the tables-generated.sql files match the output generated by generateSchemaSql.php. Something along the lines of AutoLoaderStructureTest. What do you think?

I have been thinking about it but I keep forgetting to start a ticket or possibly implement it (my unconscious is pretty lazy). I do it now.

Change 634679 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Vary timestamp default value in abstract schema

https://gerrit.wikimedia.org/r/634679

Change 634748 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

Change 634679 merged by jenkins-bot:
[mediawiki/core@master] Vary timestamp default value per platform in abstract schema

https://gerrit.wikimedia.org/r/634679

Change 634748 merged by jenkins-bot:
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

Change 636235 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Change 636235 merged by jenkins-bot:
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Change 673718 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

Change 673718 merged by jenkins-bot:
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

Change 673785 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 673785 merged by jenkins-bot:
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 676856 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

Change 676856 merged by jenkins-bot:

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

Suggestion, how about allowing "comment": [ "foo", "bar" ] as in extension.json/config/key/description? If nothing else it makes for potentially more pretty text due to line breaks.

Ladsgroup claimed this task.

The core schema is now fully abstract. We have a system for abstract schema changes that most recent schema changes in core use (among several extensions.). This is done. Of course more work can be done in every aspect (specially migrating extensions to abstract schema) but that should not be counted under implementation of this RFC.