Page MenuHomePhabricator

Create MW Schema Diff maintenance script
Open, Needs TriagePublic

Description

It was noticed by @Nikerabbit when updating translatewiki, that there is definitely some skew between the active schema for the wiki, and tables.sql - for example, it seems it already had a PK on log_search, which was added/removed in 1.16 cycle, and then re-added in 1.30... But the unique index wasn't removed again due to the patch doing both in one.

I know the DBA's have some ways of doing this, to find discrepancies in WMF production, but we should have something more "general purpose" in mediawiki, that can be used to find these differences... And if people report them back to us, we can look at implementing patches (that can often have been missed for one reason or another) to bring their wikis back into sync.

Usually, it's less obvious changes, index names, existence etc, rather than completely missing columns... But sometimes it can be differing data types

A quick google finds things like https://github.com/DBDiff/DBDiff and https://github.com/camcima/php-mysql-diff which both can be installed via composer, so would happily fit in our require-dev category. I'm not sure how well they fit our purpose, but not reinventing the wheel as much as possible should be a priority in this task. Both seem reasonably well maintained (active at least), so upstream bugs shouldn't be an issue if we run into problems.

I've no idea how exactly this will look... But just dumping thoughts here in the meantime

Event Timeline

For me this is a duplicate of T104459, or at most, a subtask.

Short term, what I would do is maintain a tables.sql on "operations side", meaning a version controlled fork of tables.sql with the current progress so it is easier to track than looking at all the patches on Schema-change-in-production.

Most of the tools you mention are unsuitable for WMF hosts because they cannot handle the hundred of thousands of objects of s3 or the large, unindexed (no primary keys) of enwiki, commons etc. Solving T17441 would be in most cases a blocker.

Subtask maybe, yeah. I wanted to do something for non WMF wikis, as we know how of skew Wikimedia wikis get, due to missing patches.

So people can look, and find any missing patches/changes, and hopefully report them back to us to remediate....

Sorry I missunderstood you- because you added us WMF DBA s and used WMF as a context, I thought you wanted a WMF-only solution, not a mediawiki one. Now I understand what you mean.

jcrespo moved this task from Done to Blocked external/Not db team on the DBA board.
LSobanski subscribed.

Removing the DBA tag and subscribing myself instead. Once there are specific actions for DBA please re-add us and/or @mention me.

@Ladsgroup does your recent work perhaps address this?

Yes. Since this was created, we found and fixed lots of schema drifts, even on s3 T260111: All sorts of random drifts in wikis in s3 and even ran it on localhost with update.php run to make sure update from an old version works fine and found an issue that got fixed. But it's a python script and not a mw maintenance script. I have mixed feelings about whether we should port it to mw core or not. It actually makes more sense these days with abstract schema but back then it made more sense to have it on python (because you basically needed to write a sql parser for it).

Pros of mw script:

  • It's available everywhere out of box
  • It can take advantage of mediawiki DBAL to check if a table has index/column/datatype/etc.

Pros of the python version:

  • It's already implemented, we don't need to do much.
  • It has been working and tested several times
  • You can easily adapt it to check extensions, etc.
  • It doesn't need to satisfy high standards of core (that's technically not a pro tbh but I'm lazy)
  • I have been cleaning all sorts of unused and broken mw scripts that have been sitting around for decades, I think there should be a strong reason to add more to it as it's likely will stay there forever.

@Reedy Does the above satisfy your original requirement (and is the original requirement from 2017 still valid)?