The image suggestions dataset models a one-to-many relationship between a page, and suggestion data (keyed by a value that identifies the corresponding compute job). This was done to allow for suggestions data to atomically replace legacy results. Preventing unbounded growth was to be done using TTLs and/or batched range DELETEs during insertion. Neither of those happened, and the dataset has grown to 159GB (it was projected to be 15GB)¹.
We need to cleanup legacy results to reclaim the space (and reduce response size & latency at the gateway service), and put in place some mechanism of managing retention moving forward.
¹ The suggestions table only, and not accounting for replication