Old system used a separate table for images which were in pipeline to be indexed and it had also status flag for failed and busy images. However, this was impractical and better way is just to track the page_id / rc_id where we currently are in the process.
TODO
- remove the old imagehash.page table
- add separate table for failed page_ids for tracking/skipping pages which failed so they can be processed later.