Page MenuHomePhabricator

Cassandra schema migrations to add page_language
Closed, ResolvedPublic

Description

In order to properly support language variants on multilanguage wikis we need to record page_language field in the title_revision table in cassandra. (See https://github.com/wikimedia/restbase/pull/1004).

For this to work the following alternations should be made:

ALTER TABLE "others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text;
ALTER TABLE "commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH"."data" ADD "page_language" text;
ALTER TABLE "wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY"."data" ADD "page_language" text;
ALTER TABLE "enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text;

After that RESTBase should be deployed with skip_schema_update: true to make the meta table match the real cassandra schema.

NOTE: Although I've tested the ALTER statements and the proposed update procedure locally, please give the statements one more good look.

Event Timeline

Pchelolo triaged this task as High priority.Jun 13 2018, 9:47 AM
Pchelolo created this task.

Because RESTBase will write a new meta record, we should also set the correct compression and compaction settings.

Right now here's the difference between the compaction/compression settings between reality in production and what RESTBase will want to set is:

RESTBase:

compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

Production:

compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

The compaction options in RESTBase are controlled by this code, and the pattern for title_revision is not set so it defaults to random_update.

The compression is coming from the table options, however I can't find out why is RESTBase using LZ4 instead of the default Snappy.

Pchelolo updated the task description. (Show Details)Jun 13 2018, 10:34 AM

Right now here's the difference between the compaction/compression settings between reality in production and what RESTBase will want to set is:
RESTBase:

compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

Production:

compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

The compaction options in RESTBase are controlled by this code, and the pattern for title_revision is not set so it defaults to random_update.
The compression is coming from the table options, however I can't find out why is RESTBase using LZ4 instead of the default Snappy.

The Cassandra default is LZ4, AFAIK.

Oh, I might have read some wrong or outdated docs

These ALTERs LGTM; For posterity, here they are as YAML:

1alter_others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe:
2 statement: |
3 ALTER TABLE "others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text
4alter_commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH:
5 statement: |
6 ALTER TABLE "commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH"."data" ADD "page_language" text
7alter_wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY:
8 statement: |
9 ALTER TABLE "wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY"."data" ADD "page_language" text
10alter_enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe:
11 statement: |
12 ALTER TABLE "enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text
13

I can apply this at any time.

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:13:04Z] <urandom> ALTERing Cassandra schema - T197082

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:38:03Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@f521e7e]: Add page_language to title_revision table T197082

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:53:47Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@f521e7e]: Add page_language to title_revision table T197082 (duration: 15m 44s)

Pchelolo closed this task as Resolved.Jun 13 2018, 4:54 PM

Ok, schema updated, PR deployed. Resolving.

Vvjjkkii renamed this task from Cassandra schema migrations to add page_language to t4aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Eevans as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Eevans; removed: Aklapper.
mobrovac renamed this task from t4aaaaaaaa to Cassandra schema migrations to add page_language.Jul 1 2018, 10:20 AM
mobrovac closed this task as Resolved.
mobrovac assigned this task to Eevans.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Eevans.