Page MenuHomePhabricator

Cassandra schema migrations to add page_language
Closed, ResolvedPublic

Description

In order to properly support language variants on multilanguage wikis we need to record page_language field in the title_revision table in cassandra. (See https://github.com/wikimedia/restbase/pull/1004).

For this to work the following alternations should be made:

ALTER TABLE "others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text;
ALTER TABLE "commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH"."data" ADD "page_language" text;
ALTER TABLE "wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY"."data" ADD "page_language" text;
ALTER TABLE "enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text;

After that RESTBase should be deployed with skip_schema_update: true to make the meta table match the real cassandra schema.

NOTE: Although I've tested the ALTER statements and the proposed update procedure locally, please give the statements one more good look.

Event Timeline

Pchelolo created this task.

Because RESTBase will write a new meta record, we should also set the correct compression and compaction settings.

Right now here's the difference between the compaction/compression settings between reality in production and what RESTBase will want to set is:

RESTBase:

compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

Production:

compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

The compaction options in RESTBase are controlled by this code, and the pattern for title_revision is not set so it defaults to random_update.

The compression is coming from the table options, however I can't find out why is RESTBase using LZ4 instead of the default Snappy.

Right now here's the difference between the compaction/compression settings between reality in production and what RESTBase will want to set is:

RESTBase:

compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

Production:

compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

The compaction options in RESTBase are controlled by this code, and the pattern for title_revision is not set so it defaults to random_update.

The compression is coming from the table options, however I can't find out why is RESTBase using LZ4 instead of the default Snappy.

The Cassandra default is LZ4, AFAIK.

Oh, I might have read some wrong or outdated docs

These ALTERs LGTM; For posterity, here they are as YAML:

1alter_others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe:
2 statement: |
3 ALTER TABLE "others_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text
4alter_commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH:
5 statement: |
6 ALTER TABLE "commons_T_title__revisions3WsaB42Wia1E_eq_KmoYTH"."data" ADD "page_language" text
7alter_wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY:
8 statement: |
9 ALTER TABLE "wikipedia_T_title__revisions3WsaB42Wia1E_eq_KmoY"."data" ADD "page_language" text
10alter_enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe:
11 statement: |
12 ALTER TABLE "enwiki_T_title__revisions3WsaB42Wia1E_eq_KmoYTHe"."data" ADD "page_language" text
13

I can apply this at any time.

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:38:03Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@f521e7e]: Add page_language to title_revision table T197082

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:53:47Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@f521e7e]: Add page_language to title_revision table T197082 (duration: 15m 44s)

Ok, schema updated, PR deployed. Resolving.

Vvjjkkii renamed this task from Cassandra schema migrations to add page_language to t4aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Eevans as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Eevans; removed: Aklapper.
mobrovac renamed this task from t4aaaaaaaa to Cassandra schema migrations to add page_language.Jul 1 2018, 10:20 AM
mobrovac closed this task as Resolved.
mobrovac assigned this task to Eevans.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Eevans.