"Yellow Medal" token, awarded by kostajh."Love" token, awarded by Marostegui."Love" token, awarded by Huji."Love" token, awarded by MusikAnimal."Like" token, awarded by Fheredia."Love" token, awarded by Tgr."Love" token, awarded by dbarratt."Like" token, awarded by Ladsgroup.

Description

~~At some point this should be a TechCom-RFC, but at the moment it's still in the drafting stage.~~

Problem

MediaWiki claims to support five databases: MySQL/MariaDB, SQLite, PostgreSQL ("PG"), Microsoft SQL Server ("MSSQL"), and Oracle Database. For normal runtime-type queries, we have abstractions that make these all mostly work pretty well.

But at the DDL level it's a completely different story. One major piece of (and source of) technical debt is the fact that MediaWiki does not have a database schema, it has four. And most schema changes have to be written five times, one for each supported database. In practice, this means schema changes for the less-supported databases are often omitted, or when not omitted are often merged without being tested.

We can improve the situation by abstracting the schema and schema change definitions, with code per database to translate that into the actual DDL statements.

Approved solution

Create a PHP interface (or base class) for schemas and schema changes. We implement this in top of Doctrine DBAL. Schemas and schema changes will be defined in JSON files conforming to https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema, which will be read into Schema or SchemaDiff objects. DBAL will then take care of generating SQL for any supported RDBMS.

In addition, database support (in the form of a subclass of the Database base class) should be made pluggable. Extensions that want to provide support for a database backend would provide a Database subclass as well as a suitable implementation of the schema and schema change interface. This would be trivial for any database that is supported by DBAL.

Notes:

This means we drop support for MSSQL and Oracle RDBMS from MediaWiki core, since DBAL support for them is insufficient and/or the schema for these databases has diverged from the main line schema. WMF will not continue support for these database backends. Volunteers have shown interest in bringing back support for these backends in form of extensions.
For schema definitions, we go with JSON for now. But the we may want to switch to YAML later, for easier editing. JSON and YAML can easily be converted into one another.
If someone wants to introduce a schema change, there should be a new deployable file which can contain several schema changes. Existing schema change (json) files should not be changed to perform additional changes.

Old proposals

Proposal #1

We should write a schema and schema change abstraction layer to integrate with MediaWiki's existing runtime database abstraction. Details are on-wiki at https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema and https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema/DB_Requirements, but in short:

We would have one schema, expressed as a structure in a JSON file. We would have one definition of each schema change, expressed as a structure in a JSON file.
Database-specific classes would exist to turn the schema or schema-change into SQL statements, much as we have database-specific subclasses of Wikimedia\Rdbms\Database.
We'd also tighten up some of the other database-level things: limited identifier lengths, index name uniqueness, data type consistency, charset consistency, etc.

The reason we didn't go with this:

It's lots of work to write a schema and schema change abstraction from scratch.

Proposal #2

Try to integrate Doctrine Migrations for schema creation and updates.

Pros (compared to Proposal #1):

We wouldn't have to implement all the database-specific logic ourself.
Probably a larger community fixing any bugs that exist.
Familiar system for (some subset of) PHP developers, simplifying onboarding of devs.

The reasons we didn't go with this:

We'd have to have code to translate MediaWiki's DB connection info to Doctrine's format, and otherwise translate between Doctrine conventions and MediaWiki conventions.
We may have to custom-implement a "mwtimestamp" type, or else standardize all DBs on using 14-byte strings.
We may still have to work around issues like MSSQL's different treatment of NULLs in unique indexes.

Details

Subject	Repo	Branch	Lines +/-
[POC] USe of Doctrine DBAL for oracle schema and schema changes	mediawiki/core	master	+230 -4
DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all	mediawiki/core	master	+9 -8
DoctrineSchemaBuilder: Make 'table_options' top-level attribute	mediawiki/core	master	+2 -4
Abstract schema: Handle MySQL Float/Double precision types	mediawiki/core	master	+59 -25
Fix ip_changes.ipc_rev_id column default value	mediawiki/core	master	+2 -2
Vary timestamp default value per platform in abstract schema	mediawiki/core	master	+34 -2
Expand DoctrineSchemaBuilderTest	mediawiki/core	master	+34 -8
Wire empty abstract schema into installer	mediawiki/core	master	+61 -10
Add doctrine/sql-formatter to pretty print generated SQL files	mediawiki/core	master	+10 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Ladsgroup	T191231 RFC: Abstract schemas and schema changes
Resolved	Ladsgroup	T230418 Drop support for MSSQL and Oracle from core
Resolved	Ladsgroup	T230419 Introduce JSON-based Doctrine-DBAL-based SchemaBuilder
Resolved	Ladsgroup	T230420 Introduce JSON-based Doctrine-DBAL-based SchemaDiffBuilder
Resolved	Ladsgroup	T230421 Introduce a maintenance script that takes abstract schema(changes\|) and gives out raw SQL
Resolved	None	T230428 Migrate tables.sql to abstract schema
Resolved	Ladsgroup	T237837 Rework documentation of schema and schema changes design and manual
Resolved	Ladsgroup	T237838 Make addWiki.php follow the new system for schema
Resolved	None	T230430 Migrate most recent schema changes to use the abstract system
Resolved	Reedy	T252305 Evaluate state of abstract schema changes in REL1_35
Resolved	Ladsgroup	T252313 Document mapping of SQL data types across DBMS
Resolved	Ladsgroup	T252919 Add a test to check if generated sql files are the same as the abstract
Open	None	T259374 Convert extensions to using abstract schema
Open	None	T261057 Allow extensions to validate that their tables.json file when transformed matches the SQL output
Resolved	None	T261911 Convert Bundled Extensions to Abstract Schema
Resolved	None	T261912 Convert WMF Deployed Extensions to Abstract Schema
Resolved	Jdforrester-WMF	T259375 Migrate Echo to Abstract Schema
Resolved	Umherirrender	T306473 Drop foreign keys from echo table echo_push_subscription
Resolved	Umherirrender	T259376 Migrate CentralAuth to Abstract Schema
Resolved	Umherirrender	T300576 Standardise type for timestamp columns in CentralAuth extension
Resolved	Umherirrender	T259377 Migrate AbuseFilter to Abstract Schema
Resolved	Daimona	T220791 afl_filter should be split in afl_filter_id and afl_global
Resolved	Marostegui	T234052 Add abuse_filter_log.afl_filter_id and afl_global columns
Resolved	Bstorm	T234615 Wikireplicas changes for abuse_filter_log including two new columns
Resolved	Daimona	T269712 Migrate afl_filter to afl_filter_id and afl_global
Resolved	Urbanecm	T269713 Run the MigrateAflFilter script for AbuseFilter
Resolved	Marostegui	T291719 Remove abuse_filter_log.afl_filter column and adjust schema consequently from Wikimedia production
Resolved	rook	T291806 Remove afl_filter column from the views
Duplicate	None	T259411 Migrate Wikibase to the abstract schema
Resolved	Umherirrender	T268535 Convert AntiSpoof to AbstractSchema
Invalid	None	T268536 Convert Babel to AbstractSchema
Invalid	None	T268537 Convert BetaFeatures to AbstractSchema
Resolved	Reedy	T268538 Convert BounceHandler to AbstractSchema
Resolved	Umherirrender	T268539 Convert CentralNotice to AbstractSchema
Resolved	Umherirrender	T310447 Standardise type for timestamp columns in CentralNotice extension
Resolved	Umherirrender	T268540 Convert CheckUser to AbstractSchema
Resolved	Umherirrender	T300575 Standardise default for cu_changes.cuc_timestamp on CheckUser extension
Resolved	Umherirrender	T268542 Convert Cognate to use AbstractSchema changes
Resolved	Umherirrender	T268543 Convert ContentTranslation to AbstractSchema
Resolved	Umherirrender	T310448 Standardise type for timestamp columns in ContentTranslation extension
Invalid	None	T268545 Convert DonationInterface to AbstractSchema
Resolved	Reedy	T268546 Convert EntitySchema to AbstractSchema
Resolved	Reedy	T268591 Make it possible for abstract schema dbs to not have a PK
Invalid	None	T268547 Convert EventLogging to AbstractSchema
Resolved	Umherirrender	T268548 Convert FlaggedRevs to AbstractSchema
Resolved	Umherirrender	T310437 Standardise type for timestamp columns in FlaggedRevs extension
Resolved	Umherirrender	T268549 Convert Flow to AbstractSchema
Resolved	Umherirrender	T310420 Standardise type for timestamp column in Flow extension
Resolved	Jdforrester-WMF	T268550 Convert GeoData to AbstractSchema
Resolved	Reedy	T268551 Convert GlobalBlocking to AbstractSchema
Resolved	Reedy	T268552 Convert GlobalPreferences to AbstractSchema
Resolved	Reedy	T268553 Convert GlobalUsage to AbstractSchema
Invalid	None	T268554 Convert GWToolset to AbstractSchema
Declined	None	T268555 Convert Jade to AbstractSchema
Resolved	Reedy	T268556 Convert LdapAuthentication to AbstractSchema
Invalid	None	T268557 Convert Linter to AbstractSchema
Resolved	Umherirrender	T268559 Convert LiquidThreads to AbstractSchema
Resolved	Umherirrender	T310423 Standardise type for timestamp columns in LiquidThreads extension
Resolved	Umherirrender	T268560 Convert MachineVision to AbstractSchema
Resolved	Physikerwelt	T268561 Convert Math to AbstractSchema
Invalid	None	T268562 Convert MediaModeration to AbstractSchema
Resolved	Umherirrender	T268563 Convert Newsletter to AbstractSchema
Resolved	Reedy	T268564 Convert OATHAuth to AbstractSchema
Resolved	Umherirrender	T268565 Convert OAuth to AbstractSchema
Resolved	Umherirrender	T300446 Standardise type for timestamp columns in OAuth extension
Resolved	Umherirrender	T268566 Convert ORES to AbstractSchema
Resolved	Reedy	T268567 Convert PageAssessment to AbstractSchema
Resolved	Umherirrender	T268568 Convert PageTriage to AbstractSchema
Resolved	Umherirrender	T310449 Standardise type for timestamp columns in PageTriage extension
Invalid	None	T268569 Convert ProofreadPage to AbstractSchema
Resolved	Reedy	T268570 Convert PropertySuggester to AbstractSchema
Resolved	Umherirrender	T268571 Convert ReadingLists to AbstractSchema
Resolved	STran	T268572 Convert SecurePoll to AbstractSchema
Resolved	Umherirrender	T272966 Migrate SecurePoll tables that need schema changes to AbstractSchema
Resolved	Tchanders	T272965 Migrate SecurePoll tables that have parity to AbstractSchema
Resolved	Umherirrender	T310444 Standardise type for timestamp columns in SecurePoll extension
Invalid	None	T268573 Convert ShortUrl to AbstractSchema
Invalid	None	T268574 Convert Thanks to AbstractSchema
Invalid	None	T268575 Convert TimedMediaHandler to AbstractSchema
Resolved	awight	T268576 Convert Translate to AbstractSchema
Resolved	Reedy	T268577 Convert UploadWizard to AbstractSchema
Invalid	None	T268578 Convert UrlShortener to AbstractSchema
Resolved	Ladsgroup	T268579 Convert WikibaseQualityConstraints to AbstractSchema
Invalid	None	T268580 Convert WikiLove to AbstractSchema
Resolved	Umherirrender	T268581 Convert WikimediaEditorTasks to AbstractSchema
Resolved	Umherirrender	T298496 Convert CreditsSource to AbstractSchema
Open	None	T270882 Convert MathSearch to AbstractSchema
Resolved	None	T280240 Convert Wikispeech to using abstract schema
Open	None	T280241 Convert WikispeechSpeechDataCollector to using abstract schema

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Reedy added a subtask: T259374: Convert extensions to using abstract schema.Jul 31 2020, 5:14 PM

Reedy mentioned this in T259377: Migrate AbuseFilter to Abstract Schema.

Huji awarded a token.Jul 31 2020, 7:05 PM

Ladsgroup mentioned this in T259771: RFC: Drop support for older database upgrades.Aug 6 2020, 1:49 AM

Aklapper removed subscribers: • Heloplus, Anomie.Oct 16 2020, 5:02 PM

Maintenance_bot removed a project: Patch-For-Review.Oct 16 2020, 5:11 PM

Change 634679 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Vary timestamp default value in abstract schema

https://gerrit.wikimedia.org/r/634679

gerritbot added a project: Patch-For-Review.Oct 17 2020, 8:24 AM

Change 634748 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

Change 634679 merged by jenkins-bot:
[mediawiki/core@master] Vary timestamp default value per platform in abstract schema

https://gerrit.wikimedia.org/r/634679

Change 634748 merged by jenkins-bot:
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.14; 2020-10-20).Oct 18 2020, 8:00 AM

Change 636235 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Change 636235 merged by jenkins-bot:
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Ladsgroup closed subtask T230420: Introduce JSON-based Doctrine-DBAL-based SchemaDiffBuilder as Resolved.Nov 19 2020, 9:14 AM

Reedy mentioned this in T268535: Convert AntiSpoof to AbstractSchema.Nov 23 2020, 11:40 PM

Reedy mentioned this in T268536: Convert Babel to AbstractSchema.

Reedy mentioned this in T268537: Convert BetaFeatures to AbstractSchema.

Reedy mentioned this in T268538: Convert BounceHandler to AbstractSchema.

Reedy mentioned this in T268539: Convert CentralNotice to AbstractSchema.

Reedy mentioned this in T268540: Convert CheckUser to AbstractSchema.Nov 23 2020, 11:42 PM

Reedy mentioned this in T268541: Convert CodeReview to AbstractSchema.

Reedy mentioned this in T268542: Convert Cognate to use AbstractSchema changes.

Reedy mentioned this in T268543: Convert ContentTranslation to AbstractSchema.

Reedy mentioned this in T268545: Convert DonationInterface to AbstractSchema.

Reedy mentioned this in T268546: Convert EntitySchema to AbstractSchema.Nov 23 2020, 11:44 PM

Reedy mentioned this in T268547: Convert EventLogging to AbstractSchema.

Reedy mentioned this in T268548: Convert FlaggedRevs to AbstractSchema.

Reedy mentioned this in T268549: Convert Flow to AbstractSchema.

Reedy mentioned this in T268550: Convert GeoData to AbstractSchema.

Reedy mentioned this in T268551: Convert GlobalBlocking to AbstractSchema.

Reedy mentioned this in T268552: Convert GlobalPreferences to AbstractSchema.Nov 23 2020, 11:47 PM

Reedy mentioned this in T268553: Convert GlobalUsage to AbstractSchema.

Reedy mentioned this in T268554: Convert GWToolset to AbstractSchema.

Reedy mentioned this in T268555: Convert Jade to AbstractSchema.

Reedy mentioned this in T268556: Convert LdapAuthentication to AbstractSchema.

Reedy mentioned this in T268557: Convert Linter to AbstractSchema.

Reedy mentioned this in T268559: Convert LiquidThreads to AbstractSchema.Nov 23 2020, 11:49 PM

Reedy mentioned this in T268560: Convert MachineVision to AbstractSchema.

Reedy mentioned this in T268561: Convert Math to AbstractSchema.

Reedy mentioned this in T268562: Convert MediaModeration to AbstractSchema.

Reedy mentioned this in T268563: Convert Newsletter to AbstractSchema.

Reedy mentioned this in T268564: Convert OATHAuth to AbstractSchema.

Reedy mentioned this in T268565: Convert OAuth to AbstractSchema.Nov 23 2020, 11:51 PM

Reedy mentioned this in T268566: Convert ORES to AbstractSchema.

Reedy mentioned this in T268567: Convert PageAssessment to AbstractSchema.

Reedy mentioned this in T268568: Convert PageTriage to AbstractSchema.Nov 23 2020, 11:54 PM

Reedy mentioned this in T268569: Convert ProofreadPage to AbstractSchema.

Reedy mentioned this in T268570: Convert PropertySuggester to AbstractSchema.

Reedy mentioned this in T268571: Convert ReadingLists to AbstractSchema.

Reedy mentioned this in T268572: Convert SecurePoll to AbstractSchema.

Reedy mentioned this in T268573: Convert ShortUrl to AbstractSchema.

Reedy mentioned this in T268574: Convert Thanks to AbstractSchema.Nov 23 2020, 11:56 PM

Reedy mentioned this in T268575: Convert TimedMediaHandler to AbstractSchema.

Reedy mentioned this in T268576: Convert Translate to AbstractSchema.

Reedy mentioned this in T268577: Convert UploadWizard to AbstractSchema.

Reedy mentioned this in T268578: Convert UrlShortener to AbstractSchema.

Reedy mentioned this in T268579: Convert WikibaseQualityConstraints to AbstractSchema.

Reedy mentioned this in T268580: Convert WikiLove to AbstractSchema.Nov 23 2020, 11:58 PM

Reedy mentioned this in T268581: Convert WikimediaEditorTasks to AbstractSchema.

Change 673718 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

Change 673718 merged by jenkins-bot:
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

ReleaseTaggerBot edited projects, added MW-1.36-notes (1.36.0-wmf.36; 2021-03-23); removed MW-1.36-notes (1.36.0-wmf.14; 2020-10-20).Mar 21 2021, 10:03 AM

Change 673785 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 673785 merged by jenkins-bot:
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 676856 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

Milimetric unsubscribed.Apr 5 2021, 2:20 PM

Change 676856 merged by jenkins-bot:

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

ReleaseTaggerBot edited projects, added MW-1.36-notes (1.36.0-wmf.38; 2021-04-06); removed MW-1.36-notes (1.36.0-wmf.36; 2021-03-23).Apr 6 2021, 1:00 AM

Reedy mentioned this in T280240: Convert Wikispeech to using abstract schema.Apr 15 2021, 10:18 AM

Reedy mentioned this in T280241: Convert WikispeechSpeechDataCollector to using abstract schema.

Suggestion, how about allowing "comment": [ "foo", "bar" ] as in extension.json/config/key/description? If nothing else it makes for potentially more pretty text due to line breaks.

Lens0021 subscribed.May 21 2021, 9:37 PM

Ladsgroup closed subtask T230428: Migrate tables.sql to abstract schema as Resolved.May 27 2021, 6:20 PM

Jdforrester-WMF closed subtask T230430: Migrate most recent schema changes to use the abstract system as Resolved.May 27 2021, 7:38 PM

The core schema is now fully abstract. We have a system for abstract schema changes that most recent schema changes in core use (among several extensions.). This is done. Of course more work can be done in every aspect (specially migrating extensions to abstract schema) but that should not be counted under implementation of this RFC.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 27 2021, 11:35 PM

Marostegui awarded a token.May 28 2021, 9:16 AM

kostajh awarded a token.Jun 1 2021, 9:51 AM

R4356th subscribed.Jun 3 2021, 5:07 AM

Aklapper closed subtask T252313: Document mapping of SQL data types across DBMS as Resolved.Jul 24 2021, 6:15 PM

Daimona mentioned this in T291725: AbuseFilter causes Error 1091: Can't DROP INDEX `afl_filter_timestamp`; check that it exists.Oct 20 2021, 1:26 PM

Umherirrender mentioned this in T298320: Create a json schema file for the abstract schema file "tables.json" (DBAL).Dec 26 2021, 12:39 AM

Maintenance_bot moved this task from Incoming to Done on the User-Ladsgroup board.Dec 26 2021, 1:15 AM

Umherirrender mentioned this in T298496: Convert CreditsSource to AbstractSchema.Jan 3 2022, 10:41 PM

Umherirrender mentioned this in T159348: Create a structure test to detect and prevent duplicate indexes.Jul 7 2022, 8:54 PM

Jdforrester-WMF closed subtask T261911: Convert Bundled Extensions to Abstract Schema as Resolved.Jul 13 2022, 5:46 PM

Jdforrester-WMF closed subtask T261912: Convert WMF Deployed Extensions to Abstract Schema as Resolved.Jul 13 2022, 7:14 PM

Umherirrender mentioned this in T313863: Core documentation docs/schema.md, docs/databases/postgres.txt and docs/databases/sqlite.txt seems out of date (after abstract schema work).Jul 26 2022, 9:29 PM

Change 496421 abandoned by Ladsgroup:

[mediawiki/core@master] [POC] USe of Doctrine DBAL for oracle schema and schema changes

Reason:

Abstract schema has been long deployed \o/

https://gerrit.wikimedia.org/r/496421

Is there a maintenance script, procedure or advice to update an old database schema to the current one?

I have a wiki just updated to 1.39 but with many tables and fields still in latin1_swedish_ci that are starting to cause trouble. Should I go one-by-one manually updating the table schemas to the current ones?

Thanks for all the good work and for any help !

In T191231#8510532, @Sophivorus wrote:

Is there a maintenance script, procedure or advice to update an old database schema to the current one?

update.php? ;)

What version are you upgrading from?

@Reedy wrote:
update.php? ;)
What version are you upgrading from?

Hah no, I wish it were that simple. I already run update.php when I upgraded from 1.35 to 1.39 and the wiki has been working ok for some weeks now. However, tracking down this bug with DiscussionTools, I narrowed it down to a database or data issue, and noticed that many of the tables and fields were latin1_swedish_ci rather than binary (both before and after the upgrade). Should this have been fixed by some update.php? Is there some other recommended way to fix it? Thanks again!

In T191231#8512604, @Sophivorus wrote:

@Reedy wrote:
update.php? ;)
What version are you upgrading from?

Hah no, I wish it were that simple. I already run update.php when I upgraded from 1.35 to 1.39 and the wiki has been working ok for some weeks now. However, tracking down [https://www.mediawiki.org/wiki/Topic:X9zviki367qeczo5 this bug with DiscussionTools], I narrowed it down to a database or data issue, and noticed that many of the tables and fields were latin1_swedish_ci rather than binary (both before and after the upgrade). Should this have been fixed by some update.php? Is there some other recommended way to fix it? Thanks again!

AFAIK no, update.php won't have ever attempted to fix it. Because MW (in theory) has never "cared" which you actually use.

There's various (MW) support articles about changing it, but there's always gotcha's.

Probably depends on your $wgDBTableOptions settings? Doctrine can generate a schema update command given an actual and a desired schema, but that will probably result in character conversion which is not what you want...
In any case, not really related to this task.

Ladsgroup removed a subtask: T237839: Wire abstract schema and schema changes for extension.Dec 22 2023, 3:56 PM

RFC: Abstract schemas and schema changesClosed, ResolvedPublicActions