Implement some way to deal with failures (incomplete DB import) - failures are expected to happen; we shouldn’t assume each day that yesterday went perfectly. We should investigate how to deal with this.
Instead of trying once a day, we'll run this script a little more frequently. Right now it's pretty catastrophic if an import fails for whatever reason. It'll stop and rollback the batch and then exit the process completely. That batch and anything after it doesn't get imported so the next day, presumably, everything goes pear-shaped because entire chunks of IPs can be missing and the database doesn't have any concept of that.
Additionally, we didn't factor in for problems outside of our control: kubernetes (apparently) can decide to kill a job midway, connections can be lost, etc, etc. These can also lead to batches failing for non-code reasons.
This proposal aims to solve the latter:
- Allow batches to move forward even if one fails
- Running the script at a TBD rate (a few hours?), check for imports with failed batches and attempt to rerun those batches. Batches are deterministic and every bit of data needed to recreate them should be stored in import_status.
- Only allow a new full import if there aren't any errors in batches
It also gives us a little flexibility to fix the emergent problems, since we should be dealing with them asap but instead of everything going very badly in 24 hours we'll slowly fall behind which seems less disastrous.