Page MenuHomePhabricator

Consistently generate schema (change) files in Wikibase
Open, Needs TriagePublic

Description

As a Wikibase developer, I want the schema and schema change files in Wikibase to be in a consistent and easy-to-maintain format, so that I can trust the schema changes are consistent across different database backends, and I can easily add additional schema changes.

Problem:
While abstract schemas have made it much easier to generate the SQL table and patch files, the process of actually running the scripts to generate those files is, as far as I’m aware, manual. This can introduce variations in the schema files, or files can even be missed entirely.

Example:

  • variations in schema files: a few patches start with Source: repo/sql/… instead of Source: extensions/Wikibase/repo/sql/ because the script was run in a different directory; fixed in I81dbb5fd4c
  • missing files: I6dcdee20f5 did not add an sqlite patch

Acceptance criteria:

  • There is a single command that developers can run to regenerate all SQL files based on the JSON files.
  • We regularly re-run this command to check if there are any changes from the committed SQL files (e.g. due to updates to the MediaWiki core code or underlying library generating the SQL)

Open questions:

Event Timeline

Change 830576 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Add Makefile generating SQL files from JSON files

https://gerrit.wikimedia.org/r/830576

I like this and has been in back of my mind on how to improve this but I hope we can find a solution that covers more than just Wikibase. Yesterday I had similar fun when doing schema change on core. One thing to keep in mind is that now we have --type=all that automatically discovers and puts the files in all three supported RDBMSes (it's not perfect, specially for core where archives are all over the place). Using type=all should simplify your makefile a lot at least.

One thing to keep in mind is that now we have --type=all that automatically discovers and puts the files in all three supported RDBMSes (it's not perfect, specially for core where archives are all over the place). Using type=all should simplify your makefile a lot at least.

I don’t think --type=all works for us, at least not while it puts files into $dir/$type/$name.sql whereas we have schema changes under $dir/$type/archives/$name.sql :/

(Edit: Hm, but maybe we can still use it for the tables at least. I’ll check that.)

(Edit: Hm, but maybe we can still use it for the tables at least. I’ll check that.)

Yeah that works, updated the change: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/830576/3..4/Makefile

I wonder if there could be extension.json sections that specify how to map JSON paths to SQL paths, one for generateSchemaSql.php and one for generateSchemaChangeSql.php? (And the MediaWiki core mappings would be hard-coded in the maintenance scripts.)

(I’d also say that figuring this out nicely for all extensions and core doesn’t need to block a solution for Wikibase. We can merge the Makefile and later replace it with the better solution.)

One thing to keep in mind is that now we have --type=all that automatically discovers and puts the files in all three supported RDBMSes (it's not perfect, specially for core where archives are all over the place). Using type=all should simplify your makefile a lot at least.

I don’t think --type=all works for us, at least not while it puts files into $dir/$type/$name.sql whereas we have schema changes under $dir/$type/archives/$name.sql :/

We definitely should strive to standardize the paths. The script can also simply check for existence of archives/ directory, it would also help with core's case.

Makefiles not helpful on windows, as far as I know.

Math has written its own maintenance script to generate all types - https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Math/+/refs/heads/master/maintenance/MathGenerateSql.php
DiscussionTools is storing the commands in composer.json - https://gerrit.wikimedia.org/g/mediawiki/extensions/DiscussionTools/+/88f8e534eb4aa07931ca3b9246003d6caae8f9fb/composer.json#19

I wonder if there could be extension.json sections that specify how to map JSON paths to SQL paths, one for generateSchemaSql.php and one for generateSchemaChangeSql.php? (And the MediaWiki core mappings would be hard-coded in the maintenance scripts.)

That could be T237839

Three different ways to the do same thing is exactly what I want to avoid 😅

The script can also simply check for existence of archives/ directory, it would also help with core's case.

Could generateSchemaChangeSQL.php just always add archives/ to the path? Are there any extensions that keep schema changes directly in mysql/postgres/sqlite directories, right next to table definitions?

Probably for small extensions but at the end, it's better to make it consistent then "next to each other". Let's make it go to archives/ and change extensions that doesn't follow this. I wish we could make core also go with mysql/ directory but that would mean a lot of changes in core.

The initiative is over and these are improvements that are welcome but not really any blocker to call the project done.