After running the DAG with the latest snapshots, and importing the data into the search indices, we have far fewer articles with the recommendation.image/exists|1 flag set in weighted_tags in the article's search index doc than we'd expect. For ptwiki we expect around 130k articles, but we have only ~2.9k
Investigating, it looks like
- the search index deltas in hive produced by the SD team seem off
- the logs for importing the data into the search index also seem off - the ptwiki was imported in 1 second, which is way too fast