Page MenuHomePhabricator

Investigate and restructure SQL directory, and use new MediaWiki abstract schema change features
Closed, ResolvedPublic

Description

In the context of T202265: Add 'tmp1' index that is in WMF production to the Wikibase install SQL files the need for a better structure came up.
Decide upon a better structure for *.sql files, and or use new MediaWiki abstract schema change features where appropriate

Goal should be:

  • Better oversight
  • Bundling of relevant files
  • Error avoidance
  • Remove the old schema changes that we no longer need (things from before the last supported version of MW)(currently 1.31)

Links:

Acceptance criteria:

  • all SQL files are in a sensible location according to their database type
  • all SQL files are generated by [maintenance/generateSchemaSql.php] from JSON sources

Details

Due Date
Sep 29 2020, 10:00 PM
SubjectRepoBranchLines +/-
mediawiki/extensions/Wikibasemaster+4 -4
mediawiki/extensions/Wikibasemaster+95 -29
mediawiki/coremaster+13 -3
mediawiki/extensions/Wikibasemaster+8 -8
mediawiki/extensions/Wikibasemaster+120 -43
mediawiki/extensions/Wikibasemaster+64 -10
mediawiki/extensions/Wikibasemaster+85 -16
mediawiki/extensions/Wikibasemaster+58 -10
mediawiki/extensions/Wikibasemaster+3 -3
mediawiki/extensions/WikimediaMaintenancemaster+1 -1
mediawiki/extensions/WikimediaMaintenancemaster+1 -1
mediawiki/extensions/Wikibasemaster+9 -9
mediawiki/extensions/Wikibasemaster+378 -97
mediawiki/extensions/Wikibasemaster+59 -16
mediawiki/extensions/Wikibasemaster+89 -17
mediawiki/extensions/Wikibasemaster+20 -65
mediawiki/extensions/Wikibasemaster+25 -13
mediawiki/extensions/Wikibasemaster+7 -16
mediawiki/extensions/Wikibasemaster+6 -6
mediawiki/extensions/Wikibasemaster+11 -2
mediawiki/coremaster+2 -2
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mediawiki core actually supports upgrading from basically any version. A couple of months ago I got a ticket that I broke upgrading from 1.10 (and people are really upgrading from 1.10 to 1.33), I honestly think this is stupid and people should upgrade in batches (1..10 -> 1.20 -> 1.25 -> 1.33 for example) but for now, that's how it works (maybe we should change or don't comply to it)

Slightly off-topic... (The handful of upgraders who have stopped in the MediaWiki Discord usually make smaller, jumps indeed, such as LTS to LTS.)

Mediawiki core actually supports upgrading from basically any version. A couple of months ago I got a ticket that I broke upgrading from 1.10 (and people are really upgrading from 1.10 to 1.33), I honestly think this is stupid and people should upgrade in batches (1..10 -> 1.20 -> 1.25 -> 1.33 for example) but for now, that's how it works (maybe we should change or don't comply to it)

Slightly off-topic... (The handful of upgraders who have stopped in the MediaWiki Discord usually make smaller, jumps indeed, such as LTS to LTS.)

Started a RFC to drop such support anyway T259771: RFC: Drop support for older database upgrades

darthmon_wmde raised the priority of this task from Medium to Needs Triage.Aug 11 2020, 12:22 PM

Change 619555 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Reorganize client schema scripts

https://gerrit.wikimedia.org/r/619555

Change 619720 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Remove duplicate code in SqlUsageTrackerSchemaUpdater

https://gerrit.wikimedia.org/r/619720

Change 619850 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Reorganize repo schema scripts

https://gerrit.wikimedia.org/r/619850

Change 619555 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Reorganize client schema scripts

https://gerrit.wikimedia.org/r/619555

Change 619720 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Remove duplicate code in SqlUsageTrackerSchemaUpdater

https://gerrit.wikimedia.org/r/619720

Change 619850 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Reorganize repo schema scripts

https://gerrit.wikimedia.org/r/619850

@Ladsgroup As far as I understand, the “use new MediaWiki abstract schema change features” part of this is not done yet. Should we still do that, or postpone it for now?

Yeah, it's not done now, I'm fine either way, I can start with client which is simpler

Change 621037 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate entity_usage table in client to abstract schema

https://gerrit.wikimedia.org/r/621037

Change 621037 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate entity_usage table in client to abstract schema

https://gerrit.wikimedia.org/r/621037

Change 621554 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_property_info to abstract schema

https://gerrit.wikimedia.org/r/621554

Change 621559 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate the new term store to abstract schema

https://gerrit.wikimedia.org/r/621559

Change 621554 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_property_info to abstract schema

https://gerrit.wikimedia.org/r/621554

Change 621559 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate the new term store to abstract schema

https://gerrit.wikimedia.org/r/621559

Change 621717 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Rename AddNormalizedTermsTablesDDL to term_store

https://gerrit.wikimedia.org/r/621717

Change 621717 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Rename AddNormalizedTermsTablesDDL to term_store

https://gerrit.wikimedia.org/r/621717

Change 621911 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/WikimediaMaintenance@master] Fix addWiki.php for moved sql files in Wikibase

https://gerrit.wikimedia.org/r/621911

Change 621911 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Fix addWiki.php for moved sql files in Wikibase

https://gerrit.wikimedia.org/r/621911

Change 621921 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/WikimediaMaintenance@master] Fix createExtensionTables.php for moved sql files in Wikibase

https://gerrit.wikimedia.org/r/621921

Change 621921 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Fix createExtensionTables.php for moved sql files in Wikibase

https://gerrit.wikimedia.org/r/621921

Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)

Change 622659 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Move changes_* files out of sql archives

https://gerrit.wikimedia.org/r/622659

Change 622659 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Move changes_* files out of sql archives

https://gerrit.wikimedia.org/r/622659

Change 627277 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_id_counters to abstract schema

https://gerrit.wikimedia.org/r/627277

Change 627281 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_items_per_site to abstract schema

https://gerrit.wikimedia.org/r/627281

Change 627285 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_changes_subscription to abstract schema

https://gerrit.wikimedia.org/r/627285

Change 627347 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_changes table to abstract schema

https://gerrit.wikimedia.org/r/627347

This is mostly done, All tables are migrated except two:

  • wb_terms, it has some complexities (float data type) that is not yet doable in abstract schema. We can wait for core to implement it (which hopefully happen in a month or two) or just wait until we remove wb_terms altogether.
  • wb_changes_dispatch: It's waiting for timestamp data type getting handled properly in core (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/626854)

Once these are done, we should clean up the file names but that's easy.

Change 627277 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_id_counters to abstract schema

https://gerrit.wikimedia.org/r/627277

Change 627281 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_items_per_site to abstract schema

https://gerrit.wikimedia.org/r/627281

Change 627285 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_changes_subscription to abstract schema

https://gerrit.wikimedia.org/r/627285

Change 627347 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_changes table to abstract schema

https://gerrit.wikimedia.org/r/627347

Change 627836 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Rename changes_* schema files

https://gerrit.wikimedia.org/r/627836

Change 627836 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Rename changes_* schema files

https://gerrit.wikimedia.org/r/627836

Now waiting on the core patch getting merged so we can pick up wb_changes_dispatch and dropping wb_terms.

Michael set Due Date to Sep 29 2020, 10:00 PM.Sep 16 2020, 3:33 PM

Added (default?) 2 week due date to see if https://gerrit.wikimedia.org/r/c/mediawiki/core/+/626854 is merged, then we can proceed as described above (T205094#6460185).

Added (default?) 2 week due date to see if https://gerrit.wikimedia.org/r/c/mediawiki/core/+/626854 is merged, then we can proceed as described above (T205094#6460185).

It's merged now, picking it up again.

Change 628469 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Improve TinyIntType

https://gerrit.wikimedia.org/r/628469

Change 628470 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Migrate wb_changes_dispatch to abstract schema

https://gerrit.wikimedia.org/r/628470

Change 628471 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Rename changes_dispatch to wb_changes_dispatch

https://gerrit.wikimedia.org/r/628471

Change 628469 merged by jenkins-bot:
[mediawiki/core@master] Improve TinyIntType

https://gerrit.wikimedia.org/r/628469

Change 628470 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate wb_changes_dispatch to abstract schema

https://gerrit.wikimedia.org/r/628470

Change 628471 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Rename changes_dispatch to wb_changes_dispatch

https://gerrit.wikimedia.org/r/628471

The only thing left is wb_terms and I assume this will be handled on the wb_terms clean up hike.

This seems to be finished. There is still a bunch questions I would have that this task does not seem to be explaining, e.g. how this "abstract schema" approach is guarding against production databases drifting from what the schema delivered with code prescribes - which seems to be called to as a reason for doing those changes; or what was the reasoning for introducing repo/sql/postgres files.

But as this didn't seem to be an issue when the task was defined, I am closing this.

We use abstract schemas for the initial table definitions, but not (as far as I’m aware) for generating schema changes… Amir, did you have any plans for that?

This seems to be finished. There is still a bunch questions I would have that this task does not seem to be explaining, e.g. how this "abstract schema" approach is guarding against production databases drifting from what the schema delivered with code prescribes - which seems to be called to as a reason for doing those changes; or what was the reasoning for introducing repo/sql/postgres files.

But as this didn't seem to be an issue when the task was defined, I am closing this.

I don't know if I understood you correctly but the abstract schema is going to help with finding drifts between production and dev code (T104459: Detect object, schema and data drifts between mediawiki HEAD, production masters and replicas) since parsing json for automated checks across the lots of databases is much easier and more accurate than parsing sql. We are already working on it and potentially in the future, easy drift fixes for small wiki will be done automatically but for now, the abstract schema is extensively being used to report drifts (you can see for example our most recent adventure T260111: All sorts of random drifts in wikis in s3, for more, look at subtickets of the above ticket).

The other point of having abstract schema is to have one source of truth, for example mediawiki core has different sql files for Postgres and MySQL and they look similar but they actually have drifted drastically (missing indexes, missing tables, etc., so far around 80 drifts have been fixed). The wikibase sql files are now generated automatically from one source of truth meaning they won't diverge (core even has tests for it now, we can write it for wikibase too)

On a rather different topic: We will have abstract schema changes support in a couple of months (currently we have abstract schema but not abstract schema change), similar to what we have but you provide a json file (with snapshot of before and after of the abstract schema of the table) and it would produce the ALTER TABLES needed for each DBMS for you through a maintenance script.

HTH

I don't know if I understood you correctly but the abstract schema is going to help with finding drifts between production and dev code (T104459: Detect object, schema and data drifts between mediawiki HEAD, production masters and replicas) since parsing json for automated checks across the lots of databases is much easier and more accurate than parsing sql. We are already working on it and potentially in the future, easy drift fixes for small wiki will be done automatically but for now, the abstract schema is extensively being used to report drifts (you can see for example our most recent adventure T260111: All sorts of random drifts in wikis in s3, for more, look at subtickets of the above ticket).

Thanks, I was not aware that work on using this JSON schema definition to automate checks like this is happening. T104459 is not very clear about this. But good to know, this seems like a sensible direction.