Page MenuHomePhabricator

convert dbstore1001 to multi-instance + InnoDB compressed by importing db shards to it
Closed, DeclinedPublic

Description

dbstore1001 should be converted to multi-instance + InnoDB compressed.

This ticket should track its progress. The shard to import should be, at least:

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8
  • x1

Event Timeline

Looks like we are hitting this: https://jira.mariadb.org/browse/MDEV-9027 on dbstore1002, so I would like to convert a couple of tables from tokudb to innodb there and see if the issue goes away as the bug says.

Mentioned in SAL (#wikimedia-operations) [2017-04-05T07:44:42Z] <marostegui> Migrate dbstore1002 enwiki.page and enwiki.categorylinks from TokuDB to InnoDB+compression - T159430

page, categorylinks and template links on dbstore1002 have been converted to InnoDB and compressed. I will talk to the analyst again and check if we got rid of those locks reported on the bug.

So we were hitting https://jira.mariadb.org/browse/MDEV-9027 with the table locks. After switching the tables to InnoDB and running the same query(s), the server doesn't lag at all.
This is in dbstore1002, but writing it here for the record.

I agree with that not being a bug, but more reasons to go for gap-based locking on InnoDB in repeatable read mode, at least until we have ROW.

Marostegui renamed this task from convert dbstore1001 to InnoDB compressed by importing db shards to it to convert dbstore1001 to multi-instance + InnoDB compressed by importing db shards to it.Dec 7 2017, 3:29 PM
Marostegui updated the task description. (Show Details)

Well, as we saw with dbstore2001, all shards fit into it, but they cannot replicate. Should we go for larger servers and leave dbstore1001 as temporary dump space?

Well, as we saw with dbstore2001, all shards fit into it, but they cannot replicate. Should we go for larger servers and leave dbstore1001 as temporary dump space?

We'd need at least two similar servers (like we do with dbstore2001/2). If not, the only workaround I can think of now is:
Rebuild dbstore1001 with multi-instance and leave it with delayed replication, which is the only way I think it would be able to replicate.

T186596 happened, we should decline this and create a new one setting up the new 2 provisioning hosts.