Page MenuHomePhabricator

Migration Plan 1
Closed, DeclinedPublic

Description

I) Preparation

  • update write logic so that it would write to both schemas (wb_terms and normalized one) on configuration (Task T219297 and T219295)
  • write a maintenance scripts for populating the normalized tables with property terms (Task T219894)
  • write a maintenance scripts for populating the normalized tables with item terms (part of Checkpoint T219122)

II) Property Terms migration (small footprint on time and size)
turn on configuration for writing to both schemas for properties & run the maintenance script for properties (part of Checkpoint T219301)

estimated increase in disk space usage =
    ratio of properties/entities * current disk usage of wb_terms (incl. indexes) as of March 2019 * ratio of data size before/after normalization (accord. to test run and incl. indexes)

estimated increase in disk space usage =
    0.0001 * 846GB * 0.2 = ~17.5MB

so roughly we need extra of 18MB for property terms migration. this extra size is not expected to grow significantly before we manage to reach the point to drop the whole wb_terms.

migration time is hard to estimate, but is not expected to be long for properties. We will measure and use that time to later estimate item migration time.

III) Item Terms migration (big footprint on time and size)
turn on configuration for writing to both schemas for items & run the maintenance script for items (part of Checkpoint T219123)

estimated increase in disk space usage =
    ratio of items/entities * current disk usage of wb_terms (incl. indexes) as of March 2019 * ratio of data size before/after normalization (accord. to test run and incl. indexes)

estimated increase in disk space usage =
    0.9999 * 846GB * 0.2 = ~170GB

so roughly we need extra of 170GB for item terms migration. expected increased to this number may be significant.

migration time is to be estimated based on proeprty terms migration is over and measured.

rollback plan
Stopping migration and rolling back is straightfowrad in this plan. We only to stop the migration script and stop writing to new schema, and can drop new schema tables if necessary too.

Event Timeline

Addshore moved this task from incoming to in progress on the Wikidata board.Mar 25 2019, 3:59 PM
alaa_wmde added subscribers: Ladsgroup, Addshore.

@Ladsgroup @Addshore I just updated the description here to reflect our sequential plan for migration that we talked about.. can you pls have a look, fill in further necessary info for DBAs review and pass it to DBAs already?

alaa_wmde updated the task description. (Show Details)Apr 1 2019, 2:56 PM
alaa_wmde updated the task description. (Show Details)Apr 1 2019, 2:59 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde added a subscriber: JeroenDeDauw.EditedApr 1 2019, 3:02 PM

I updated the description with rough estimation of extra size we will need for property terms migration on db nodes .. @Ladsgroup @JeroenDeDauw pls have a look as I might have missed/miscalculated smth.

The numbers are taken from previous wb_terms bonfire slides and investigation documents.

alaa_wmde updated the task description. (Show Details)Apr 1 2019, 3:32 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde updated the task description. (Show Details)Apr 1 2019, 3:37 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde updated the task description. (Show Details)Apr 1 2019, 3:42 PM
alaa_wmde updated the task description. (Show Details)Apr 1 2019, 3:55 PM
alaa_wmde updated the task description. (Show Details)

@jcrespo @Marostegui

I updated task description with some estimations. In summary we plan to migrate in two parts, the first will need ~18MB extra disk space and the second will require extra of ~170BG.

Our original plan/expectations is to run the first part (II in task description) sometime very soon, like sometime next week. The second part (III) is to begin more towards end of the month.

Did you get any idea on other teams' plans/needs already? Do you already have a recommendation on when we can do our two parts?

@alaa_wmde quick question before I read everything, those extra requirements will only affect s8 (wikidatawiki), right?

alaa_wmde added a comment.EditedApr 2 2019, 4:01 PM

@Marostegui I think so as it is only needed where we have wb_terms table for now. @Ladsgroup am I correct ?

@Marostegui I think so as it is only needed where we have wb_terms table for now. @Ladsgroup am I correct ?

We have testwikidatawiki in s3 + wb_terns exists in commonswiki (s5) but it's empty

alaa_wmde added a comment.EditedApr 2 2019, 4:14 PM

@Marostegui we can ignore s5 (will not run migration there) .. and probably s3 is many orders of magnitude smaller than the estimations above (we might run migration plans on it for testing) .. not sure if that tells you something

alaa_wmde updated the task description. (Show Details)Apr 2 2019, 4:22 PM

wb_terms is in commonswiki (s4)? What? When did that happen?

wb_terms is in commonswiki (s4)? What? When did that happen?

commons turned to a Wikibase repo as part of SDoC then it started to have some data but it got disabled later. I'm not sure dropping the table would be good idea but it's empty so not much harm I guess.

Yeah, I just read a notification from Jaime on a different ticket about it. Looks like he wasn't aware either of that actually being created, that is a bit odd but should be discussed on that other ticket not here I guess :-/
If it is disabled and with minimal data (-rw-rw---- 1 mysql mysql 192K Jan 9 23:47 wb_terms.ibd), it shouldn't be a blocker for this.
I will do some math on s8 with those estimations and get back to you

So in terms of s8:

I think step 1 (only 18MB) should be fine to be done anytime (still give us a heads up before starting for it).

However, the second part, that requires 170GB can be a bit more dangerous.
Right now the master only has 800GB free and I don't feel comfortable leaving it with just 600GB available, specially because it might be tight if we have unexpected growths, emergency/unexpected ALTER tables to do etc, which has happened in the past.

One of our Q4 goals is to start replacing eqiad masters at some point (as they are very old) T217396: Decommission db1061-db1073. s8 (wikidatawiki) has, like pretty much all the other sections, and old master that will be replaced.
The new hardware is already at the datacenter, it is just pending racking, installation and setup, - once replaced the new master will have 1.2TB free T211613: rack/setup/install db11[26-38].eqiad.wmnet which is a lot healthier and less tight.
s4 and s8 are the masters I wanted to give priority to be replaced, as both are not in great shape.

To sum up, I would like the second step to wait for the new master to be up and running

alaa_wmde added a comment.EditedApr 4 2019, 8:47 AM

@Marostegui thanks for your update!

Postponing until the new master is up and running is very unlikely to be the adjustment we would go for.

The more likely one would be to change to migration plan 2 (take from bonfire slides):
on write: write to new schema and delete from old one
on read read from both schemas
when old wb_terms is empty, read only from new schema and drop wb_terms table eventually

This plan will actually not require extra space (not significant increase at least) and will result in reducing the total space usage over time.
It is hard to estimate the footprint of this plan on performance, which is likely to be a little higher than the original one. If I can think of a way to estimate I will share some numbers soon.

We have yet to discuss it a bit amongst us .. but thought of sharing it with you already in case you can already spot any concerns with it.

@Marostegui thanks for your update!

Postponing until the new master is up and running is very unlikely to be the adjustment we would go for.

The more likely one would be to change to migration plan 2 (take from bonfire slides):

I don't have access to that. Just requested it.

on write: write to new schema and delete from old one
on read read from both schemas
when old wb_terms is empty, read only from new schema and drop wb_terms table eventually

This plan will actually not require extra space (not significant increase at least) and will result in reducing the total space usage over time.

You've got any estimations on how much it would require?

It is hard to estimate the footprint of this plan on performance, which is likely to be a little higher than the original one. If I can think of a way to estimate I will share some numbers soon.

This would be useful to know if you can come with some figures.
I assume it is possible to stop the migration right away if we see performance issues and then resume once we are on a better shape (ie: we discover the performance degradation is an issue and we decide to fully wait for the new master), right?

@Marostegui thanks for your update!

Postponing until the new master is up and running is very unlikely to be the adjustment we would go for.

The more likely one would be to change to migration plan 2 (take from bonfire slides):

I don't have access to that. Just requested it.

For some reason I can't see a request from you in my mailbox .. but here's the slide screenshot that describes the plan

on write: write to new schema and delete from old one
on read read from both schemas
when old wb_terms is empty, read only from new schema and drop wb_terms table eventually

This plan will actually not require extra space (not significant increase at least) and will result in reducing the total space usage over time.

You've got any estimations on how much it would require?

I think it won't require anything significant, I mean nothing other than usual dbms needs for schema/meta info on tables and indexes.

It is hard to estimate the footprint of this plan on performance, which is likely to be a little higher than the original one. If I can think of a way to estimate I will share some numbers soon.

This would be useful to know if you can come with some figures.
I assume it is possible to stop the migration right away if we see performance issues and then resume once we are on a better shape (ie: we discover the performance degradation is an issue and we decide to fully wait for the new master), right?

Good question.. I don't think stop here would be possible as in "we stop dealing with new schema immediately and just switch back to write and read from old wb_terms", because we would have already deleted some data from wb_terms and moved them over to new schema. So stopping here would mean to "migrate back" in which we: on write we write to old schema and delete from new schema, and on read we read from both until new schema is empty. That would not release us immediately from the degradation in performance as we would wish.


I think we'll need to wait until we have written some of the necessary code for migration and run some tests to get some numbers.

Depending on T219121

Stalled until we have necessary code to run some migration test runs locally to get a feeling on added performance penalty of migration.

@Ladsgroup is there a way you know of in which we can create similar buffer pool configuraion locally and simulate some "heavy" traffic while doing migration test run? doesn't need to be exact as in production, but just help us get a little closer to the right degree of accuracy with our estimation.

@Marostegui thanks for your update!

Postponing until the new master is up and running is very unlikely to be the adjustment we would go for.

The more likely one would be to change to migration plan 2 (take from bonfire slides):

I don't have access to that. Just requested it.

For some reason I can't see a request from you in my mailbox .. but here's the slide screenshot that describes the plan

@Addshore gave me access a few hours ago :)

on write: write to new schema and delete from old one
on read read from both schemas
when old wb_terms is empty, read only from new schema and drop wb_terms table eventually

This plan will actually not require extra space (not significant increase at least) and will result in reducing the total space usage over time.

You've got any estimations on how much it would require?

I think it won't require anything significant, I mean nothing other than usual dbms needs for schema/meta info on tables and indexes.

It is hard to estimate the footprint of this plan on performance, which is likely to be a little higher than the original one. If I can think of a way to estimate I will share some numbers soon.

This would be useful to know if you can come with some figures.
I assume it is possible to stop the migration right away if we see performance issues and then resume once we are on a better shape (ie: we discover the performance degradation is an issue and we decide to fully wait for the new master), right?

Good question.. I don't think stop here would be possible as in "we stop dealing with new schema immediately and just switch back to write and read from old wb_terms", because we would have already deleted some data from wb_terms and moved them over to new schema. So stopping here would mean to "migrate back" in which we: on write we write to old schema and delete from new schema, and on read we read from both until new schema is empty. That would not release us immediately from the degradation in performance as we would wish.


I think we'll need to wait until we have written some of the necessary code for migration and run some tests to get some numbers.

Yes, we definitely need some re-thinking on how to stop the migration or rollback if something arises, which can be a possibility on such complex process. Specially performance-wise, things can go different than original planned.
Also, we might have the new master in place if this will take some more time, which is also a good thing in general: more disk space, faster disks, more memory... :-)

alaa_wmde updated the task description. (Show Details)Apr 5 2019, 2:20 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde renamed this task from Pass migration plan to DBAs for review to Migration Plan 1.Apr 5 2019, 7:23 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde moved this task from In Progress to Done on the Wikidata wb_terms Trailblazing board.
alaa_wmde closed this task as Declined.Apr 9 2019, 9:32 AM