Page MenuHomePhabricator

Integrate Facebooks "Online Schema Change for MySQL" into MediaWiki
Closed, ResolvedPublic



Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:51 PM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz30824.
bzimport added a subscriber: Unknown Object (MLST).

tl;dr summary:

This Online Schema Change tool is a PHP script which manages cloning a table, applying an ALTER TABLE, and then swapping it back in place.

This runs online -- not interfering with live operations on the main tables -- by using triggers to capture changes made to the original table while the new table is being changed.

Note that in a replicated environment, you need to run this on all slaves before running on the master -- but unlike our prior schema update methods, it doesn't require halting replication while running, or switching masters.

Sounds pretty spiffy!

I'm not exactly sure about the licensing, but it seems, as long as we keep the file somewhat as is, it should be fine.

For MW usage, it seem probably all that we need is a fairly simple wrapper/maintenance script of some kind, to iterate over the list of slaves, then the master, applying any and all patches. Certainly shouldn't be too much work to get done

Certainly seems a worthwhile task for Ops or even Gen Eng to get done, especially if we're ideally heading for a more frequent release (to WMF at least), model

jcrespo added a subscriber: jcrespo.

Note that we already use online schema change directly supported on Mariadb 10 / MySQL 5.6 or pt-table-checksum (update.php would do that also automatically if available). If I were you, I would work towards releasing some software to control and deploy changes easily rather than integration itself (which will be highly dependent on the underlying infrastructure).

The WMF plan I started considering (still in draft) is at T104459, which would match reedy's model- I would disagree on it being simple, though :-). But the idea is that in the future, schema changes should be mostly self-serving. If I were to suggest how mediawiki would facilitate that, is to improve support for multiple masters (for example through better support of galera master-master): T119626; support a better pooling mechanism (which I think is the goal of the ongoing T156924); and eliminate monster tables (which has been discussed on T161671 and others).

BTW, facebook's took got rewritten in python: however, most people are abandoning trigger-based tables in favour of the native solution or triggerless ones like:

Marostegui added a subscriber: Marostegui.

If no one disagrees I am going to close this as we are already using pt-online-schema-change and when available we use INPLACE operations supported on MariaDB.
As Jaime pointed out, the most popular tool out there now is gh-ost (along with pt-online-schema-change).
We could give gh-ost a go, but so far we don't have much bandwidth for it, and we can track it in an specific task for all the related tests.

If anyone believes this should remain open feel free to reopen.