Page MenuHomePhabricator

[M] Populate Hive tables that will feed Cassandra
Closed, ResolvedPublic

Description

As done in Image-Suggestions , populate the following Hive tables with Section-Level-Image-Suggestions output:

  • suggestions
  • title_cache
  • instanceof_cache
  • search_index_full
  • search_index_delta

Event Timeline

CBogen renamed this task from Populate Hive tables that will feed Cassandra to [M] Populate Hive tables that will feed Cassandra.Feb 8 2023, 5:52 PM
CBogen assigned this task to Cparle.

Update on the suggestions part ... I have altered the image_suggestions_suggestions table in my own hive db (the hql query to do so is here) to hold section_heading, but I'm getting an error writing the altered table when I run the pipeline script on stat1007

Here's the error

The column number of the existing table cormac.image_suggestions_suggestions(struct<page_id:bigint,id:string,image:string,origin_wiki:string,confidence:int,found_on:array<string>,kind:array<string>,page_rev:bigint,snapshot:string,wiki:string>) doesn't match the data schema(struct<page_id:bigint,id:string,image:string,origin_wiki:string,confidence:int,found_on:array<string>,kind:array<string>,page_rev:bigint,section_heading:string,snapshot:string,wiki:string>)

... so it looks like spark is seeing the table without the new column.

Currently stuck on this - @JAllemandou or @xcollazo any ideas? If I don't get it figured out before the end of the day maybe someone else can look next week

Let's do a debug session if you have the time @Cparle.

If I don't get it figured out before the end of the day maybe someone else can look next week

Happy to take over on Monday.

This has been merged into branch https://gitlab.wikimedia.org/repos/structured-data/image-suggestions/-/tree/T311289-combined.

Will wait until we merge that branch into main to close this ticket.