Before a ttl was set on the instanceof_cache and title_cache tables in Cassandra, plenty of old data was written, and that data persists in the tables
Most of this can be cleaned up in a similar manner to how the suggestions data was cleaned up in https://phabricator.wikimedia.org/T317364#8400180 - i.e.
- gather all wiki/page_id/rev_page combinations for all suggestions in image_suggestions_suggestions in Hive for all snapshots
- left_anti join them with all wiki/page_id/rev_page combinations in image_suggestions_suggestions in Hive from the latest snapshot
- write the results to a csv
- use a python script to read the csv one row at a time and delete from Cassandra