Page MenuHomePhabricator

Change new storage strategy defaults for Cassandra compression
Closed, ResolvedPublic

Description

Traditionally, k-r-v -based tables in RESTBase used o.a.cassandra.io.compress.DeflateCompressor with a chunk length of 256kb. This was done to optimize compression for document history.

For storage of current revisions, we should revert to settings that are more best practice for Cassandra, and which should yield better performance (namely: {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'})

Event Timeline

Eevans triaged this task as Medium priority.Oct 26 2017, 6:25 PM

Mentioned in SAL (#wikimedia-operations) [2017-10-26T18:26:45Z] <urandom> T179105: Altering "enwiki_T_page__summary".data to use LZ4Compressor

It occurs to me that this, and T178846 will both require an update to the schema version, and will result in a number of (no-op) schema migrations when deployed. This is probably not what we want (at least in the near-term).

See also: T179083: Cassandra schema creation seems unreliable

mobrovac subscribed.

It occurs to me that this, and T178846 will both require an update to the schema version, and will result in a number of (no-op) schema migrations when deployed. This is probably not what we want (at least in the near-term).

Uh, indeed. We will need to change the default compaction settings as well and increment the base schema versions, since otherwise RB won't be able to start at all. Furthermore, in order to prevent altering requests in our production environment, we will need to alter the meta records manually.

It occurs to me that this, and T178846 will both require an update to the schema version, and will result in a number of (no-op) schema migrations when deployed. This is probably not what we want (at least in the near-term).

Uh, indeed. We will need to change the default compaction settings as well and increment the base schema versions, since otherwise RB won't be able to start at all. Furthermore, in order to prevent altering requests in our production environment, we will need to alter the meta records manually.

I added this to the agenda of our team meeting, but I think the TL;DR is that we can choose from one of (in no particular order):

  • Make RESTBase-induced schema modifications more robust (and then allow it to migrate them)
  • Make RESTBase-induced schema modifications die in a fire (come up with an alternative to auto-migration, and implement it now)
  • Hack our way around it (somehow) this time (ala meta alters, or similar)
  • Punt on the new compression and compaction defaults for the time being

Basically, either "bite the bullet and fix the underlying problem now" (whatever that means), hack our way around this migration (in the same way we did when we manually added recent schema), or pretend these tickets don't exist for the time being. :)

NOTE: Any follow up about the future of schema migrations in RESTBase specifically, should probably be followed up on in T179083, (or perhaps even in new, separate ticket).

It seems like compaction/compression stuff is are database tuning parameters and not something an application should care about. In this vein I've created https://github.com/wikimedia/restbase-mod-table-cassandra/pull/216 - it completely removes ant references regarding compaction/compression from code, also after the first deploy it will clear up meta schemas from any code-induced compaction/compression properties without executing any alter table statements. Also, it's fully compatible with the current client code - it doesn't require version increments in clients.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Eevans claimed this task.

It seems like compaction/compression stuff is are database tuning parameters and not something an application should care about. In this vein I've created https://github.com/wikimedia/restbase-mod-table-cassandra/pull/216 - it completely removes ant references regarding compaction/compression from code, also after the first deploy it will clear up meta schemas from any code-induced compaction/compression properties without executing any alter table statements. Also, it's fully compatible with the current client code - it doesn't require version increments in clients.

Thanks (3 years late 😬)!