Lift Wing needs a role with corresponding grants for accessing the ml_cache keyspace.
See also: T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task
Lift Wing needs a role with corresponding grants for accessing the ml_cache keyspace.
See also: T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| cassandra: create revise_tone_task_generator Cassandra role | operations/puppet | production | +10 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | ppelberg | T392954 [FY25-26] WE 1.1: Increase constructive edits | |||
| Open | KStoller-WMF | T396162 [EPIC] Revise Tone: Structured Task (WE1.1.2, FY25-26) | |||
| Open | None | T408341 Q1 FY2025-26 Goal: Task generation engine for Revise Tone task | |||
| Resolved | DPogorzelski-WMF | T409414 Configure Lift Wing isvc Integration with Cassandra | |||
| Resolved | Eevans | T409850 Cassandra role & grants for Lift Wing isvc integration |
With respect to GRANTs, is it safe to assume that MODIFY is sufficient? There is no requirement to do reads here, is there?
Change #1203857 had a related patch set uploaded (by Eevans; author: Eevans):
[operations/puppet@production] cassandra: create ml_inference_service Cassandra role
With respect to GRANTs, is it safe to assume that MODIFY is sufficient? There is no requirement to do reads here, is there?
When the service starts, Lift Wing will validate whether the target table exists, so we'll need SELECT as well. @BWojtowicz-WMF, is it correct?
When the service starts, Lift Wing will validate whether the target table exists, so we'll need SELECT as well. @BWojtowicz-WMF, is it correct?
In the current implementation, we try to validate that our connection is successful and the table exists by running a this query on init: SELECT table_name FROM system_schema.tables WHERE keyspace_name = %s AND table_name = %s. I think the system_schema.tables is readable by default, this validation probably doesn't require any additional permissions.
I'm not sure if this test is really necessary, but it should work regardless, yes.
I'll also put my Cassandra maintainer hat on long enough to tell you that these tables aren't guaranteed to be stable; It's not a public interface, and these system tables could change in future Cassandra versions (though in practice they rarely, if ever, do).
Change #1203857 merged by Eevans:
[operations/puppet@production] cassandra: create revise_tone_task_generator Cassandra role
Ok, a new role has been created: revise_tone_task_generator, and it has been given MODIFY permissions on ml_cache.page_paragraph_tone_scores. This is the case for both the cassandra-dev (staging), and aqs (production) clusters.
The password for revise_tone_task_generator has been added to hiera in the private git repository, it should never be copied elsewhere.
You can use the data-gateway helm chart as an example of referencing the password for deployments, as well as for connection information (it is configured for the same Cassandra clusters).
Let me know if you have any questions, or notice any issues!
cassandra@cqlsh> LIST ALL PERMISSIONS ON ml_cache.page_paragraph_tone_scores ;
role | username | resource | permission
----------------------------+----------------------------+---------------------------------------------+------------
[ ... ]
revise_tone_task_generator | revise_tone_task_generator | <table ml_cache.page_paragraph_tone_scores> | MODIFY
(20 rows)
cassandra@cqlsh> DESCRIBE ml_cache.page_paragraph_tone_scores ;
CREATE TABLE ml_cache.page_paragraph_tone_scores (
wiki_id text,
page_id bigint,
revision_id bigint,
model_version text,
idx int,
content text,
score float,
PRIMARY KEY ((wiki_id, page_id), revision_id, model_version, idx)
) WITH CLUSTERING ORDER BY (revision_id ASC, model_version ASC, idx ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
cassandra@cqlsh>cassandra@cqlsh> LIST ALL PERMISSIONS ON ml_cache.page_paragraph_tone_scores ;
role | username | resource | permission
----------------------------+----------------------------+---------------------------------------------+------------
[ ... ]
revise_tone_task_generator | revise_tone_task_generator | <table ml_cache.page_paragraph_tone_scores> | MODIFY
(12 rows)
cassandra@cqlsh> DESCRIBE KEYSPACE ml_cache;
CREATE KEYSPACE ml_cache WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3', 'codfw': '3'} AND durable_writes = true;
CREATE TABLE ml_cache.page_paragraph_tone_scores (
wiki_id text,
page_id bigint,
revision_id bigint,
model_version text,
idx int,
content text,
score float,
PRIMARY KEY ((wiki_id, page_id), revision_id, model_version, idx)
) WITH CLUSTERING ORDER BY (revision_id ASC, model_version ASC, idx ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
cassandra@cqlsh>