I think that a good first attempt at improving on this model would be to subtract the vector extracted for the "parent" revision's text from the vector extracted for the current revision's text. This would give positive values for hashes that correspond to segments added and a negative value for hashes that correspond to segments removed.
I'd try to get this to work as a proof of concept like hashing_vectorizer.ipynb and then we can talk about engineering and eventually hyperparameter tuning to see how much fitness we can squeeze out of the strategy.
This task is done when an analysis shows that we can train/test a sklearn model for detecting damage using current features *and* hash vector features.