Page MenuHomePhabricator

Write search index data for image suggestions into a hive table rather than local hdfs files
Closed, ResolvedPublic

Description

ATM we're writing search index data into a local parquet file, but the search team's data pipelines expect it to be in a hive table

This ticket is to change where the data is written to ... it also means we can probably remove the cleanup.py script we were using to clean up old parquet files

Event Timeline

Change 797361 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[wikimedia/discovery/analytics@master] weekly import of image suggestions

https://gerrit.wikimedia.org/r/797361

Change 797361 merged by jenkins-bot:

[wikimedia/discovery/analytics@master] weekly import of image suggestions

https://gerrit.wikimedia.org/r/797361

Mentioned in SAL (#wikimedia-operations) [2022-05-23T18:05:56Z] <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@d1f4367]: T307983: weekly import of image suggestions

Mentioned in SAL (#wikimedia-operations) [2022-05-23T18:08:17Z] <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@d1f4367]: T307983: weekly import of image suggestions (duration: 02m 21s)

Change 797420 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[wikimedia/discovery/analytics@master] convert_to_esbulk: Zero-pad dates within @dailysnapshot

https://gerrit.wikimedia.org/r/797420

Change 797420 merged by jenkins-bot:

[wikimedia/discovery/analytics@master] convert_to_esbulk: Zero-pad dates within @dailysnapshot

https://gerrit.wikimedia.org/r/797420

Mentioned in SAL (#wikimedia-operations) [2022-05-23T19:17:09Z] <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@5a4803a]: T307983: zero-pad dates within @dailysnapshot

Mentioned in SAL (#wikimedia-operations) [2022-05-23T19:19:30Z] <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@5a4803a]: T307983: zero-pad dates within @dailysnapshot (duration: 02m 20s)