Page MenuHomePhabricator

Migration Plan 3
Closed, ResolvedPublic


Property terms migration


We will migrate proeprty terms from wb_terms table (old) into the new schema tables (new)
following the usual migration plan:

  1. read old, write old
  2. read old, write both & run maintenance on all Properties
  3. read new, write both
  4. read new, write new


  • Demand on extra disk space will be below 20MB for the whole thing.
  • Maintenance script is estimated to execute in matter of few minutes.
  • Property terms are small in size but they are heavy-read. After putting caches in place, no risk on read performance degradation as we will switch to reading from new right after maintenance script is finished, and probably also stop writing to new the next day.

Item terms migration


To avoid a very big extra disk space needed when we start migrating item terms,
as well as a risky overhead on read performance, we will do this in a slightly modified
version of the usual process (the one we follow above).

  1. read old, write both - run maintenance on Q1-Q2mio terms (the most accessed ones)
  2. read one, write both for Q ID <= 2mio or write old only otherwise (timeboxed for 2 weeks to monitor performance)
  3. read one, write one (until we have new master with higher capacities) & run maintenance on all items
  4. read new, write new


  • write one means programmatically decide which schema to write to based on item id ( > 2mio => old, <= 2mio => new)
  • read one means programmatically decide which schema to read from based on item id ( > 2mio => old, <= 2mio => new)
  • run maintenance here will always be done in batches/iterations.


  • In step 1, we migrate 10% of items to the new schema in order to reduce the extra disk space needed after step 1, which should be around 17GB.
  • We also greatly reduce the risk of performance degradation on reads as we will determine which schema to read from on application level and avoid reading from both always.
  • In case of need to revert or rollback, that will be quick and easy during step 2 (which is delayed until we have the new more capable master in place and shouldn't expect much problems anyway by then)

Event Timeline

Keep in mind that we might not see on disk reduction after step #1 is done as we'd probably have to optimize wb_terms to reclaim physical disk space back. This cannot be done as there is not enough available space to do that operation.
This is not a big deal, as 17GB isn't life-saving, but just mentioning it here for the record.

@Marostegui thanks for sharing .. yes that's why we also declined Migration Plan 2 that assumed that we will be able to optimize to reclaim space over time of deleted records in old.

I just updated the two statements in the description that felt not so accurate. (Items terms migration, step 2. and the statement about until when we can rollback)

Yes, The plan is to do items (and start the migration for items) after new master is in place. Does it sound feasible to you @Marostegui?

alaa_wmde reopened this task as Open.

My only thought when reading this is that when we get onto point 3 for items, data in wb_terms will remain and slowly become out of date? (unless there are some deletes happening during these writes?)

This is something we should very clearly document and also communicate to wb_terms table users, mainly thinking about tools / labs and also wmde analytics (Goran)?