The search index data is imported from image_suggestions_search_index_delta, which is created by diffing this week's snapshot with last week's snapshot in image_suggestions_search_index_full
The diff is between the current snapshot and the snapshot for the last successful run of the DAG, which means if we have a DAG that failed *after* it generated the search index data (which gets imported more or less straight away once the delta is ready) then the next run will be diffed against the wrong data
The search team now dumps the contents of the search indices into the discovery database in hive. If we use the table cirrus_index_without_content with image_suggestions_search_index_full to create image_suggestions_search_index_delta then I think our output data will be more robust.