We're currently using the vanilla pickle implementation, for no particular reason other than simplicity. There are some good options available, we should check them for disk, memory, or startup time savings.
This patch implements joblib serialization, which has optimized support for large numpy arrays and inline compression: https://github.com/wiki-ai/revscoring/pull/408