Page MenuHomePhabricator

Upcoming phabricator upgrade requires unusually long database migrations
Closed, ResolvedPublic

Description

This is mostly a heads-up to DBAs. I will need to run a rather large database migration during the Phabricator upgrade which is scheduled for this week. (Wednesday night / Thursday morning, specifically, at Midnight UTC)

Having a DBA on hand shouldn't be necessary but it certainly wouldn't hurt. If the timing is inconvenient then we can move the deployment window one way or the other: earlier on Wednesday evening or later on Thursday morning.

@jcrespo: What do you think would be best?

Event Timeline

mmodell moved this task from To Triage to Misc on the Phabricator board.

Midnight UTC would be impossible for me
Thursday UTC morning I could do it (I could be around 6AM UTC or so). Would that work?

About the migration, I assume you'd take a backup before the migration? Also, you've got a rollback procedure in case this migration fails?

Can you share the migration script in advance?

Midnight UTC would be impossible for me
Thursday UTC morning I could do it (I could be around 6AM UTC or so). Would that work?

Understood. The time slot is chosen at just about the most inconvenient time to avoid disruptions, however, in this instance I think we should reschedule it to a time that's good for you. Thursday morning 06:00 UTC would work for me. That's midnight in my timezone, which is actually a good time for me, believe it or not.

About the migration, I assume you'd take a backup before the migration? Also, you've got a rollback procedure in case this migration fails?

There is no phabricator rollback procedure, other than deploying the older revision and restoring a database snapshot. Without a snapshot it's impossible to roll back. I am not so much worried about the migration failing as I am concerned about downtime due to a very long running migration. We have an extremely large database by phabricator standards. When upstream decides to refactor a large table, the migration can take one or two orders of magnitude more time to run against our database compared to most other users (even Phacility's own phabricator instance).

Can you share the migration script in advance?

Yes, I can share the migration scripts. There will be several which I believe are at least partially implemented in PHP rather than pure sql. I will try to deduce which scripts will need to run and then link them here.

Here is the one that I expect to be slow: https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

It took 116,080 ms to run on upstream's database. It could be much slower for us.

The full list of migrations is https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/%3B215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b?offset=800 - we will be running all of the migrations from 2018 since our last deployment was last year.

Here is the one that I expect to be slow: https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

It took 116,080 ms to run on upstream's database. It could be much slower for us.

The full list of migrations is https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/%3B215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b?offset=800 - we will be running all of the migrations from 2018 since our last deployment was last year.

I might be looking at the wrong thing, I can only see an UPDATE, is there any alter table involved here?

The slow one is backfilling the data, the alter table is in a separate migration which should be much quicker than the population script. There is a similar pair of migrations in there for some differential tables, however, we don't have as much data in differential so I don't expect that one to be slow.

Is it easy to trick phabricator into skipping migrations?- we could do that safely and online, and it seems backwards compatible.

It is a pretty small table so don't think this will be too slow:

root@db1043:/srv/sqldata# find . | grep maniphest_task.ibd | xargs ls -lh
-rw-rw---- 1 mysql mysql 348M Feb 14 12:48 ./phabricator_maniphest/maniphest_task.ibd

@jcrespo: it's possible but I'd have to hack around some detection scheme, phabricator normally refuses to even serve pages when the migrations are out of date. I could probably just modify the migration script to make it a noop, that way it'll get recorded as if it was applied. Then it's simple enough to run the exact same code manually after the upgrade.

@Marostegui: that's one of the largest maniphest_task tables in teh world, despite it's size in raw bytes ;)

I'm surprised it's only 348MB though, that's smaller than I would have guessed.

The alter is done on the slave already- try to see if you can cleanly skip that migration- if there are other alters backwards compatible, we could do those, too. During the mainteance, we will switchover to the new host, you finish the migrations, and you have a newer version and a new, faster hosts with the latest mariadb package.

The switchover will probably require your help restarting apache/phab at the time.

That is not what I asked, I asked you to disable the first one (the alter), not the second one.

Change 410640 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover dbproxy1008 from db1043 to db1059

https://gerrit.wikimedia.org/r/410640

Change 410640 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover dbproxy1008 from db1043 to db1059

https://gerrit.wikimedia.org/r/410640

Change 410641 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] Point m3-master to dbproxy1008

https://gerrit.wikimedia.org/r/410641

Change 410643 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Repoint dbproxy1003 after reimage

https://gerrit.wikimedia.org/r/410643

mmodell assigned this task to jcrespo.

Everything looks good, thanks @jcrespo and @Marostegui

Change 410651 merged by Jcrespo:
[operations/puppet@production] mariadb: promote db1059 to be the new m3 master

https://gerrit.wikimedia.org/r/410651

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['dbproxy1003.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201802150710_jynus_19516.log.

Change 410643 merged by Jcrespo:
[operations/puppet@production] mariadb: Repoint dbproxy1003 after reimage

https://gerrit.wikimedia.org/r/410643

Completed auto-reimage of hosts:

['dbproxy1003.eqiad.wmnet']

and were ALL successful.