The suggestions table in the image_suggestions Cassandra keyspace requires an additional column to cater for Section-Level-Image-Suggestions data, namely section_heading (string).
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T311814 [EPIC] Section-level image suggestions data pipeline | |||
Resolved | mfossati | T320831 Section Level Image Suggestions - Data Persistence Request | |||
Resolved | Eevans | T328670 Add section title column to image_suggestions.suggestions table schema |
Event Timeline
What timing constraints —if any— do we have for this? For example, if the column were added today, are downstream implementations prepared to deal with the additional attribute (with or without a value present)? When do you need this done by?
Data transfer to Cassandra is the last step of the pipeline and we're currently developing the initial ones, see T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code.
We've very roughly estimated to reach T328672: [M] Populate Hive tables that will feed Cassandra in 32 days of work: once we're there, we should be all set to populate the additional attribute.
More explicitly, this ticket is blocked by T328672: [M] Populate Hive tables that will feed Cassandra.
Ok myself and @mfossati agree that this is no longer blocked.
if the column were added today, are downstream implementations prepared to deal with the additional attribute (with or without a value present)?
Yes they are (confirmed for the Growth team via slack by @kostajh )
So @Eevans we're good to add this column whenever you get a chance
Folks, on T328672, we are calling this column section_heading.
Let's make sure we use the same column name in Cassandra as well?
Change 899774 had a related patch set uploaded (by Eevans; author: Eevans):
[generated-data-platform/datasets/image-suggestions@main] cassandra_schema.cql: document additional attribute
The parent ticket seemed to indicate three additional attributes, one for a textual section heading, one for a numeric index, and another for the page's wikidata...so to be 100% certain: We're adding a single (un-indexed) attribute named section_heading of type text, correct?
If so I will apply the following DDL to the production cluster (say tomorrow in the am, utc-500), at which point it will take effect immediately.
ALTER TABLE image_suggestions.suggestions ADD section_heading text;
We're adding a single (un-indexed) attribute named section_heading of type text, correct?
Correct - section_heading is the only thing clients need, so we're keeping it as simple as possible for now
If so I will apply the following DDL to the production cluster (say tomorrow in the am, utc-500)
Perfect, thanks @Eevans
Mentioned in SAL (#wikimedia-operations) [2023-03-16T14:06:05Z] <urandom> ALTER-ing image_suggestions.suggestion table — T328670
Change 899774 merged by jenkins-bot:
[generated-data-platform/datasets/image-suggestions@main] cassandra_schema.cql: document additional attribute