Page MenuHomePhabricator

Add section title column to image_suggestions.suggestions table schema
Closed, ResolvedPublic

Description

The suggestions table in the image_suggestions Cassandra keyspace requires an additional column to cater for Section-Level-Image-Suggestions data, namely section_heading (string).

Event Timeline

What timing constraints —if any— do we have for this? For example, if the column were added today, are downstream implementations prepared to deal with the additional attribute (with or without a value present)? When do you need this done by?

What timing constraints —if any— do we have for this? For example, if the column were added today, are downstream implementations prepared to deal with the additional attribute (with or without a value present)? When do you need this done by?

Data transfer to Cassandra is the last step of the pipeline and we're currently developing the initial ones, see T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code.
We've very roughly estimated to reach T328672: [M] Populate Hive tables that will feed Cassandra in 32 days of work: once we're there, we should be all set to populate the additional attribute.

Ok myself and @mfossati agree that this is no longer blocked.

if the column were added today, are downstream implementations prepared to deal with the additional attribute (with or without a value present)?

Yes they are (confirmed for the Growth team via slack by @kostajh )

So @Eevans we're good to add this column whenever you get a chance

Folks, on T328672, we are calling this column section_heading.

Let's make sure we use the same column name in Cassandra as well?

Good catch @xcollazo - I edited the task description

Change 899774 had a related patch set uploaded (by Eevans; author: Eevans):

[generated-data-platform/datasets/image-suggestions@main] cassandra_schema.cql: document additional attribute

https://gerrit.wikimedia.org/r/899774

The parent ticket seemed to indicate three additional attributes, one for a textual section heading, one for a numeric index, and another for the page's wikidata...so to be 100% certain: We're adding a single (un-indexed) attribute named section_heading of type text, correct?

If so I will apply the following DDL to the production cluster (say tomorrow in the am, utc-500), at which point it will take effect immediately.

ALTER TABLE image_suggestions.suggestions ADD section_heading text;

We're adding a single (un-indexed) attribute named section_heading of type text, correct?

Correct - section_heading is the only thing clients need, so we're keeping it as simple as possible for now

If so I will apply the following DDL to the production cluster (say tomorrow in the am, utc-500)

Perfect, thanks @Eevans

Mentioned in SAL (#wikimedia-operations) [2023-03-16T14:06:05Z] <urandom> ALTER-ing image_suggestions.suggestion table — T328670

Change 899774 merged by jenkins-bot:

[generated-data-platform/datasets/image-suggestions@main] cassandra_schema.cql: document additional attribute

https://gerrit.wikimedia.org/r/899774

Eevans claimed this task.