Page MenuHomePhabricator

Finalize database schema for MCR content meta-data
Closed, ResolvedPublic

Description

This task covers the specification of the DB schema for content meta-data.

Design draft: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Database_Schema

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Change 378724 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] [WIP] first shot at MCR database schema

https://gerrit.wikimedia.org/r/378724

Change 378729 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] [DNM] Remove redundant fields from old DB tables.

https://gerrit.wikimedia.org/r/378729

@jcrespo After some discussion on https://www.mediawiki.org/wiki/Talk:Multi-Content_Revisions/Database_Schema, on https://gerrit.wikimedia.org/r/#/c/378724/4, and on the mailing list, there are a few questions that require a judgement call from a DBA. Ideally of course we would test and measure. But I'm afraid at least some of these decisions need to be made beforehand, so an assessment from experience would be very helpful.

So here are the most pressing questions:

  1. Should we introduce content_sha1?
    • The content table is going to be tall (initially, as tall as revision, but two or three times taller later), so I want to keep it narrow.
    • If we have it, we probably want to have an index on it (this would clash with sharding, though, see below)
    • If we don't have it, we need to load the content of all slots to calculate rev_sha1 for each new revision.
    • ...or we drop rev_sha1, and calculate the hash for use in API results and XML dumps on the fly.
    • If we introduce content_sha1 now, we can still write empty strings to it later to reduce size.
    • We could choose a more compact representation than base36 used for rev_sha1
  2. Should we build in support for sharing (partitioning) on the page ID?
    • we would need slot_page and content_page
    • all (unique/primary) indexes would have to start with the page id
    • all selects and joins will have to use the page ID (this needs some changes in the code, but not much)
    • it's two integer (or bigint) columns in two very tall tables
    • queries across pages will be inefficient, or need a separate table (note that we don't need a contributions table unless we also partition the revision table)
    • in some (rare) cases (e.g. undelete of a very old revision, or a history split), page ids would have to be re-written or content rows would have to be duplicated with different page ids.
  3. Should slot_revision be declared BIGINT right away?
    • we will need to convert rev_id to BIGIN soonish
    • until then, is it bad to join an int to a bigint in queries?
    • if we declare it INT for now, we'll have to convert it when we convert rev_id.

What is your opinion on these trade-offs? What additional information do you need to decide these? What should be the next steps be towards a decision?

daniel renamed this task from Finalize DB schema for content meta-data to Finalize database schema for MCR content meta-data .Sep 28 2017, 3:22 PM

@jcrespo After some discussion on https://www.mediawiki.org/wiki/Talk:Multi-Content_Revisions/Database_Schema, on https://gerrit.wikimedia.org/r/#/c/378724/4, and on the mailing list, there are a few questions that require a judgement call from a DBA. Ideally of course we would test and measure. But I'm afraid at least some of these decisions need to be made beforehand, so an assessment from experience would be very helpful.

So here are the most pressing questions:

  1. Should we introduce content_sha1?
    • The content table is going to be tall (initially, as tall as revision, but two or three times taller later), so I want to keep it narrow.
    • If we have it, we probably want to have an index on it (this would clash with sharding, though, see below)
    • If we don't have it, we need to load the content of all slots to calculate rev_sha1 for each new revision.
    • ...or we drop rev_sha1, and calculate the hash for use in API results and XML dumps on the fly.
    • If we introduce content_sha1 now, we can still write empty strings to it later to reduce size.
    • We could choose a more compact representation than base36 used for rev_sha1

I don't really have an opinion on that to be honest. What would you prefer?.
I do agree that the narrower the table is, the better.

  1. Should we build in support for sharing (partitioning) on the page ID?
    • we would need slot_page and content_page
    • all (unique/primary) indexes would have to start with the page id
    • all selects and joins will have to use the page ID (this needs some changes in the code, but not much)
    • it's two integer (or bigint) columns in two very tall tables
    • queries across pages will be inefficient, or need a separate table (note that we don't need a contributions table unless we also partition the revision table)
    • in some (rare) cases (e.g. undelete of a very old revision, or a history split), page ids would have to be re-written or content rows would have to be duplicated with different page ids.

Regarding whether we should use partitioning or not - the only concern I'd have (by looking at the numbers listed on https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Database_Schema) is the disk size. The access pattern and performance...I am not super sure it would get an instant benefit from the sharding (look at the current revision or page table numbers).
It is hard to tell without having some scenarios where we can test.

By looking at the current size of revision and page table (for enwiki), and looking at the draft numbers posted on the above link, for the first years we should probably be fine.
Not sure if we can have an optimal design now that would prevent us from dropping this in production and avoid touching it for 5 years :-)
I would prefer no partitioning for now.

  1. Should slot_revision be declared BIGINT right away?
    • we will need to convert rev_id to BIGIN soonish
    • until then, is it bad to join an int to a bigint in queries?
    • if we declare it INT for now, we'll have to convert it when we convert rev_id.

It is true that in some cases joining different column types can give some performance drawbacks, we should do some testing to see how much it would degrade the query.
If at some point we have to convert rev_id to BIGINT, that will be a big maintenance that can potentially take weeks/months, so I guess we can include the conversion of slot_revision to bigint along with it.
If that maintenance is going to happen anyways, we could probably go for int with slot_revision and schedule both maintenances at the same time?

This is just my opinion, not an statement on what we should do.
It is hard to give more ideas without knowing mediawiki in depth. Actually you guys are probably more capable than me to set a real statement here! :-)

  1. Should we introduce content_sha1?

I don't really have an opinion on that to be honest. What would you prefer?.
I do agree that the narrower the table is, the better.

Application developers prefer to have the SHA1. I'm trying to weigh cost against benefit. I can assess benefit from the conversation on wikitech-l (basically: it would be very annoying not to have this, but not catastrophic, and maybe it's enough to have the hash in the dumps, not in the database).

Can you help me assess cost?

Also, would it help much if we used a binary representation of the hash, instead of base36?

  1. Should we build in support for sharing (partitioning) on the page ID?

I would prefer no partitioning for now.

I did not expect us to introduce partitioning right away. This is just about including the information needed for partitioning.

Do you also think it is not necessary/useful to provision for partion now? If we don't include the page id fields now, we would have to add them later if we want to introduce partitioning. Would that be a problem? Or would it not significantly add to the cost and complexity of introducing partitioning later?

  1. Should slot_revision be declared BIGINT right away?

If that maintenance is going to happen anyways, we could probably go for int with slot_revision and schedule both maintenances at the same time?

I don't know, you tell me :)

This is just my opinion, not an statement on what we should do.
It is hard to give more ideas without knowing mediawiki in depth. Actually you guys are probably more capable than me to set a real statement here! :-)

That's the problem: nobody seems for feel confident to make any final statement on questions like this. MW experts don't know the database details enough. DBAs don't know the application layer enough. Nobody is willing to do a final sign-off.

The best we can do is talk to each other and make sure neither group sees a major problem. If we wait until everyone is confident that the proposal is optimal, we'll probably never move. Apparently there just is nobody with the expertise to make that assessment.

What I got out of the meeting we just had:

Should we introduce content_sha1?

DBAs don't really care. Narrower tables are better, but the cost here probably isn't very significant if it's needed.

Making the field nullable would allow for reclaiming the majority of the space without an alter if we decide we don't need it after all (although I think the difference is only 7 or 8 bits per row versus setting it empty?).

Should we build in support for sharing (partitioning) on the page ID?

DBAs say no to MySQL's built-in partitioning. If we're going to do partitioning then we would need do it at the application level rather than the DB level.

I note this gives us more flexibility. For example, we don't actually need the page_id in the tables in order to shard by page_id as long as all the code that is doing the DB access knows the page_id to be able to use the right shard.

Should slot_revision be declared BIGINT right away?

This wasn't discussed that I recall. See the existing answer in T174028#3664170. However,

If that maintenance is going to happen anyways, we could probably go for int with slot_revision and schedule both maintenances at the same time?

Of course we can schedule the change of slot_revision at the same time we schedule the change of rev_id and every other field that refers to rev_id, it's just one more ALTER. I suppose what Daniel is really looking for is for someone to say which is worse, performance impact of joining a bigint with an int until we do do the alter or the time it'll take to alter slot_revision later. It sounds like the answer at the moment is "we don't know what the performance impact is, so we can't say".

Ok, here is what I plan to do:

  1. make sha1 field nullable. That way, we can easily stop writing it to preserve space, without the need for a schema change.
  2. no page IDs in the content or slots tables. Since we won't use native MySQL partitioning, we can introduce the fields if and when we create the shards - or leave them out if we don't need them then.
  3. I'll make slot_revision a bigint to avoid the need to alter the table when rev_id becomes a bigint.

But we'll do some benchmarking to see how that impacts join performance. If performance degrades because of the difference in the field width, we'll have to think about making slot_revision 32 bit for now.

One additional comment is that if you kind of agree, in some cases, to do the "new table and copy" pattern rather than an alter. It is much easier to add or transform columns later, so no need to optimize in advance. NULLs in compact or above rows formats take 1 byte min, but only 1 bit is used for each nulled columns (e.g. a table with 7 null columns only takes 1 byte in total per row). Indexes not accounted.

If the tables were small and do not have a lot of activity (e.g. <20GB), alters are trivial. They become a huge burden if they are large an busy because we have to do the alters at the same time that queries are reading and writing from them, as in most cases we have to pool and depool every server. This is mostly true right now for revision, (maybe page and recentchanges), and image on commons.

Another thing that would cut a lot in time for alters is to integrate etcd into mediawiki. If for a single alters, we have to commit and uncommit 200 times the pooling and depooling of each server, rather than a fully automatized system, that would speed up *a lot* any deployment.

Another thing that would cut a lot in time for alters is to integrate etcd into mediawiki. If for a single alters, we have to commit and uncommit 200 times the pooling and depooling of each server, rather than a fully automatized system, that would speed up *a lot* any deployment.

See T156924: Allow integration of data from etcd into the MediaWiki configuration.

@jcrespo I fully agree to use create&copy instead of altering tables. This has been part of the design all along, altering existing tables when introducing MCR was never the plan. We'll create new tables, copy information, and stop using some fields in the revision table. If we want, we can at some point drop these fields from the revision table. But it's no problem to just have them sitting around.

@Anomie I know it is on your mind, it was a wink that for any estimation we give you know on some schema changes, I firmly believe we can cut by 10 the time it takes to deploy them if that was already there. Also an explanation why they take some much time right now.

daniel claimed this task.

Final schema version can be found at https://www.mediawiki.org/wiki/Multi-Content_Revisions/Database_Schema and https://gerrit.wikimedia.org/r/#/c/378724.

Further tweaks may be needed once we can test the new schema on a full scale database and simulated query load.

Change 378724 had a related patch set uploaded (by Aude; owner: Daniel Kinzler):
[mediawiki/core@master] MCR database schema

https://gerrit.wikimedia.org/r/378724

Change 378724 merged by jenkins-bot:
[mediawiki/core@master] MCR database schema

https://gerrit.wikimedia.org/r/378724

[mediawiki/core@master] MCR database schema
https://gerrit.wikimedia.org/r/378724

This PR causes:

...column 'logging.log_comment' is already set as NOT NULL
...column 'logging.log_comment_id' already exists
Changing 'protected_titles.pt_reason' default value
...column 'protected_titles.pt_reason' is already set as NOT NULL
...column 'protected_titles.pt_reason_id' already exists
...comment table already exists.
...index site_stats_pkey already set on site_stats table.
...ip_changes table already exists.
...slots table already exists.
...content table already exists.
Creating content_moddels table ...[d15b7b76a1d10fa2abe8440a] [no req]   Wikimedia\Rdbms\DBQueryError from line 1185 of ...\includes\libs\rdbms\database\Database.php: A database query error has occurred. Did you forget to run your app
lication's database schema updater after upgrading?
Query: CREATE SEQUENCE content_models_model_id_seq

Function: Wikimedia\Rdbms\Database::sourceFile( .../maintenance/postgres/archives/patch-content_models-t
able.sql )
Error: 42P07 ERROR:  relation "content_models_model_id_seq" already exists


Backtrace:
#0 ...\includes\libs\rdbms\database\DatabasePostgres.php(262): Wikimedia\Rdbms\Database->reportQueryErro
r(string, string, string, string, boolean)
#1 ...\includes\libs\rdbms\database\Database.php(997): Wikimedia\Rdbms\DatabasePostgres->reportQueryErro
r(string, string, string, string, boolean)
#2 ...\includes\libs\rdbms\database\Database.php(3422): Wikimedia\Rdbms\Database->query(string, string)
#3 ...\includes\libs\rdbms\database\Database.php(3370): Wikimedia\Rdbms\Database->sourceStream(unknown type, NULL, NULL, string, NULL)
#4 ...\includes\installer\DatabaseUpdater.php(683): Wikimedia\Rdbms\Database->sourceFile(string)
#5 ...\includes\installer\DatabaseUpdater.php(726): DatabaseUpdater->applyPatch(string, boolean, string)

#6 ...\includes\installer\DatabaseUpdater.php(482): DatabaseUpdater->addTable(string, string)
#7 ...\includes\installer\DatabaseUpdater.php(446): DatabaseUpdater->runUpdates(array, boolean)
#8 ...\maintenance\update.php(200): DatabaseUpdater->doUpdates(array)
#9 ...\maintenance\doMaintenance.php(94): UpdateMediaWiki->execute()
#10 ...\maintenance\update.php(245): require_once(string)

on

MediaWiki	1.31.0-alpha (80f7ac1)
PHP	7.1.1 (apache2handler)
PostgreSQL	9.3.12
ICU	57.1

@aude @daniel Please make sure you follow-up on this.

Change 401774 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Fix typo in PostgresUpdater in I30a3a983

https://gerrit.wikimedia.org/r/401774

Change 401774 merged by jenkins-bot:
[mediawiki/core@master] Fix typo in PostgresUpdater in I30a3a983

https://gerrit.wikimedia.org/r/401774

Change 378729 abandoned by Daniel Kinzler:
[DNM] Remove redundant fields from old DB tables.

https://gerrit.wikimedia.org/r/378729