There’s a concern that the sampling applied across languages/projects/platforms is not weighted properly. Check on that and fix and/or document as needed.
For example, the recent fix to the iOS bug around May 11 seems to have had an unexpectedly large effect on User Engagement (https://discovery.wmflabs.org/metrics/#kpi_augmented_clickthroughs). There seems to be something wrong in either the weighting of the sampling across platforms, or some people’s understanding of what it measures. (E.g., you can have an unweighted mean of rates to get a sense of “average rate”, but that’s not what most people expect from a unified rate measure.)