ATM suggestions are always added to Cassandra, and older suggestions are not removed
Create a mechanism for removing older suggestions from previous iterations so that clients don't receive outdated suggestions
Implementation details
Suggestions are stored with an id that allows us to distinguish between suggestions from different runs of the data pipeline.
- when generating a set of suggestions, store the id of the latest data-pipeline run in hdfs
- create a script that reads the id of the latest data-pipeline run from hdfs, and deletes all suggestions with a different id
- schedule execution of the new script on completion of the transfer of suggestions data from hive to cassandra