To address the performance needs of our machine translation service MinT , we need a dedicated cache store. Details as follows
- The cache will store a machine translation that is expected to be requested repeatedly in future.
- The cache key is going to be sourcelanguagecode-targetlanguagecode-hash(source content).
- The cache value is machine translated content. For storage optimizations, this could be compressed too.
- The cached translation is not going to change as long as the MT model is not changing. Hence it could be cached for a long time. However, its usefulness and impact or cache store size need to considered. So an ideal TTL would be 7 days
- Data store size: cache key would take 2 bytes per language code, 2 bytes for delimiter, 32 bytes for sha256 = 36 bytes. Assuming 1000 chars per translation, gzip with 50% compression can take 500bytes. But since this would be unicode, single byte assumption does not hold. So let us assume 2000 bytes. One million such records will take roughly 2 GB. Add datastore overheads too.
- The cache will be accessed from the deployed machine translation service.
- The traffic for machine translation service is currently 2 req/second, but cannot speculate its growth in future much. Going by 10req/second may be safe. But I don't think this much traffic is big issue for a redis-like cache system.