As done in Image-Suggestions , populate the following Hive tables with Section-Level-Image-Suggestions output:
- suggestions
- title_cache
- instanceof_cache
- search_index_full
- search_index_delta
As done in Image-Suggestions , populate the following Hive tables with Section-Level-Image-Suggestions output:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T311814 [EPIC] Section-level image suggestions data pipeline | |||
Resolved | Cparle | T311829 [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code | |||
Resolved | xcollazo | T328672 [M] Populate Hive tables that will feed Cassandra |
Update on the suggestions part ... I have altered the image_suggestions_suggestions table in my own hive db (the hql query to do so is here) to hold section_heading, but I'm getting an error writing the altered table when I run the pipeline script on stat1007
Here's the error
The column number of the existing table cormac.image_suggestions_suggestions(struct<page_id:bigint,id:string,image:string,origin_wiki:string,confidence:int,found_on:array<string>,kind:array<string>,page_rev:bigint,snapshot:string,wiki:string>) doesn't match the data schema(struct<page_id:bigint,id:string,image:string,origin_wiki:string,confidence:int,found_on:array<string>,kind:array<string>,page_rev:bigint,section_heading:string,snapshot:string,wiki:string>)
... so it looks like spark is seeing the table without the new column.
Currently stuck on this - @JAllemandou or @xcollazo any ideas? If I don't get it figured out before the end of the day maybe someone else can look next week
Let's do a debug session if you have the time @Cparle.
If I don't get it figured out before the end of the day maybe someone else can look next week
Happy to take over on Monday.
MR up for review at https://gitlab.wikimedia.org/repos/structured-data/image-suggestions/-/merge_requests/10.
Phew! This took a while!
This has been merged into branch https://gitlab.wikimedia.org/repos/structured-data/image-suggestions/-/tree/T311289-combined.
Will wait until we merge that branch into main to close this ticket.