Page MenuHomePhabricator

Prune suggestions from previous iterations from Cassandra
Closed, DeclinedPublic

Description

ATM suggestions are always added to Cassandra, and older suggestions are not removed

Create a mechanism for removing older suggestions from previous iterations so that clients don't receive outdated suggestions


Implementation details

Suggestions are stored with an id that allows us to distinguish between suggestions from different runs of the data pipeline.

  • when generating a set of suggestions, store the id of the latest data-pipeline run in hdfs
  • create a script that reads the id of the latest data-pipeline run from hdfs, and deletes all suggestions with a different id
  • schedule execution of the new script on completion of the transfer of suggestions data from hive to cassandra

Event Timeline

Cparle added a subscriber: Eevans.

Having spoken to @Eevans about this I'm going to close this ticket. Because the script runs only once a week there's no way to completely prevent out of date data from being served to users, and keeping the has-suggestion flags up to date in the wiki search indices should prevent the particular problem we were trying to fix with this ticket from ever reaching users