Upcoming phabricator upgrade requires unusually long database migrations
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• mmodell
	Feb 13 2018, 12:35 AM

Description

This is mostly a heads-up to DBAs. I will need to run a rather large database migration during the Phabricator upgrade which is scheduled for this week. (Wednesday night / Thursday morning, specifically, at Midnight UTC)

Having a DBA on hand shouldn't be necessary but it certainly wouldn't hurt. If the timing is inconvenient then we can move the deployment window one way or the other: earlier on Wednesday evening or later on Thursday morning.

@jcrespo: What do you think would be best?

Details

Subject	Repo	Branch	Lines +/-
mariadb: Repoint dbproxy1003 after reimage	operations/puppet	production	+4 -7
mariadb: promote db1059 to be the new m3 master	operations/puppet	production	+4 -4
Point m3-master to dbproxy1008	operations/dns	master	+1 -1
mariadb: Switchover dbproxy1008 from db1043 to db1059	operations/puppet	production	+4 -4

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• mmodell	T182832 Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state
		Resolved		• jcrespo	T187143 Upcoming phabricator upgrade requires unusually long database migrations

Event Timeline

• mmodell created this task.Feb 13 2018, 12:35 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2018, 12:35 AM

• mmodell added a subscriber: Marostegui.Feb 13 2018, 12:36 AM

Paladox subscribed.Feb 13 2018, 12:36 AM

• mmodell triaged this task as Medium priority.Feb 13 2018, 12:36 AM

• mmodell moved this task from To Triage to Misc on the Phabricator board.

Midnight UTC would be impossible for me
Thursday UTC morning I could do it (I could be around 6AM UTC or so). Would that work?

About the migration, I assume you'd take a backup before the migration? Also, you've got a rollback procedure in case this migration fails?

Can you share the migration script in advance?

Aklapper edited projects, added Phabricator (2018-02-xx); removed Phabricator.Feb 13 2018, 12:04 PM

In T187143#3966122, @Marostegui wrote:

Midnight UTC would be impossible for me
Thursday UTC morning I could do it (I could be around 6AM UTC or so). Would that work?

Understood. The time slot is chosen at just about the most inconvenient time to avoid disruptions, however, in this instance I think we should reschedule it to a time that's good for you. Thursday morning 06:00 UTC would work for me. That's midnight in my timezone, which is actually a good time for me, believe it or not.

About the migration, I assume you'd take a backup before the migration? Also, you've got a rollback procedure in case this migration fails?

There is no phabricator rollback procedure, other than deploying the older revision and restoring a database snapshot. Without a snapshot it's impossible to roll back. I am not so much worried about the migration failing as I am concerned about downtime due to a very long running migration. We have an extremely large database by phabricator standards. When upstream decides to refactor a large table, the migration can take one or two orders of magnitude more time to run against our database compared to most other users (even Phacility's own phabricator instance).

In T187143#3966303, @jcrespo wrote:

Can you share the migration script in advance?

Yes, I can share the migration scripts. There will be several which I believe are at least partially implemented in PHP rather than pure sql. I will try to deduce which scripts will need to run and then link them here.

• mmodell mentioned this in T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state.Feb 13 2018, 7:55 PM

Here is the one that I expect to be slow: https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

It took 116,080 ms to run on upstream's database. It could be much slower for us.

The full list of migrations is https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/%3B215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b?offset=800 - we will be running all of the migrations from 2018 since our last deployment was last year.

I do not see any schema change, only updates using ids, is there an actual schema change done before that? https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

In T187143#3971548, @mmodell wrote:

Here is the one that I expect to be slow: https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

It took 116,080 ms to run on upstream's database. It could be much slower for us.

The full list of migrations is https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/%3B215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b?offset=800 - we will be running all of the migrations from 2018 since our last deployment was last year.

I might be looking at the wrong thing, I can only see an UPDATE, is there any alter table involved here?

https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.01.close.sql

The slow one is backfilling the data, the alter table is in a separate migration which should be much quicker than the population script. There is a similar pair of migrations in there for some differential tables, however, we don't have as much data in differential so I don't expect that one to be slow.

Is it easy to trick phabricator into skipping migrations?- we could do that safely and online, and it seems backwards compatible.

In T187143#3971555, @mmodell wrote:

https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.01.close.sql

It is a pretty small table so don't think this will be too slow:

root@db1043:/srv/sqldata# find . | grep maniphest_task.ibd | xargs ls -lh
-rw-rw---- 1 mysql mysql 348M Feb 14 12:48 ./phabricator_maniphest/maniphest_task.ibd

@jcrespo: it's possible but I'd have to hack around some detection scheme, phabricator normally refuses to even serve pages when the migrations are out of date. I could probably just modify the migration script to make it a noop, that way it'll get recorded as if it was applied. Then it's simple enough to run the exact same code manually after the upgrade.

@Marostegui: that's one of the largest maniphest_task tables in teh world, despite it's size in raw bytes ;)

I'm surprised it's only 348MB though, that's smaller than I would have guessed.

The alter is done on the slave already- try to see if you can cleanly skip that migration- if there are other alters backwards compatible, we could do those, too. During the mainteance, we will switchover to the new host, you finish the migrations, and you have a newer version and a new, faster hosts with the latest mariadb package.

The switchover will probably require your help restarting apache/phab at the time.

• mmodell added a parent task: T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state.Feb 15 2018, 1:54 AM

• mmodell edited projects, added Phabricator (2018-02-15); removed Phabricator (2018-02-xx).Feb 15 2018, 1:56 AM

That is not what I asked, I asked you to disable the first one (the alter), not the second one.

Change 410640 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover dbproxy1008 from db1043 to db1059

https://gerrit.wikimedia.org/r/410640

gerritbot added a project: Patch-For-Review.Feb 15 2018, 5:53 AM

Change 410640 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover dbproxy1008 from db1043 to db1059

https://gerrit.wikimedia.org/r/410640

Change 410641 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] Point m3-master to dbproxy1008

https://gerrit.wikimedia.org/r/410641

Change 410643 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Repoint dbproxy1003 after reimage

https://gerrit.wikimedia.org/r/410643

Everything looks good, thanks @jcrespo and @Marostegui

Change 410651 merged by Jcrespo:
[operations/puppet@production] mariadb: promote db1059 to be the new m3 master

https://gerrit.wikimedia.org/r/410651

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['dbproxy1003.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201802150710_jynus_19516.log.

Change 410643 merged by Jcrespo:
[operations/puppet@production] mariadb: Repoint dbproxy1003 after reimage

https://gerrit.wikimedia.org/r/410643

Completed auto-reimage of hosts:

['dbproxy1003.eqiad.wmnet']

and were ALL successful.

Upcoming phabricator upgrade requires unusually long database migrationsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Upcoming phabricator upgrade requires unusually long database migrations
Closed, ResolvedPublic
Actions

Related Objects
Search...