While deploying the new storage model to production we've temporary made it issue requests for mobile content for both old and new storage, compare the results and log a warning if the content doesn't match. The rate of mismatches is not very high, however all the mismatches a due to one of the responses not reflecting the change that made the serenader happen - either some Wikidata properties are missing or not updated, different revisions of some template are used or the whole page revision is different.
Current theory is that this is caused by DB replication lag - different MCS instances hit different slaves and get different results. Although the rates of such events for content mismatches is low, the rate of MCS getting stale data must be much higher - if both MCS instances hit a lagging slave there will be no mismatch, but the content will be stale in both renders.
There're several possible solutions to this, but the most obvious are:
- Introduce some delay before starting rerenders. It's simple, but doesn't guarantee correctness in any way, just reduces the probability of errors
- Make it hit the master - that greatly increases the load on a master mysql, probably beyond it's capacity.