A clean way to specify external data is necessary, and a script is needed to load data without rendering downtime. I have some mostly finished work on a script that is suitable.
I've thrown https://github.com/kartotherian/meddo/blob/master/get-external-data.py at some unusual situations including aborting mid-run, changing files on it, wiping DBs, git clean -xfd, etc. I haven't made it fail and it manages to consistently update when needed, and not update when not needed. Storing data status in DB when loading files in DB is really good.
I don't doubt there are some failure conditions out there I'm not covering, but it's working better than any other scripts I've seen that do the same thing, so I'm okay calling this resolved.