The Data Lake [1] is the place we're putting analytics-friendly data in Hadoop. The first data to land there is from the Mediawiki History Reconstruction project. We have computed metrics that power this dashboard [2] and want to vet that the new data hasn't screwed up the metrics compared to their old counterparts in vital signs. The new numbers are close to the old numbers with some notable exceptions. Our analysis is in this spreadsheet [3]. We know the reasons behind the differences and want to work with you (Research) to make sure they're forgivable enough to power Wikistats 2.0.
[1] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake
[2] https://analytics.wikimedia.org/dashboards/standard-metrics/
[3] https://docs.google.com/spreadsheets/d/12nHxfp5cerKwAc1Q7W_DudSJ-ZhmynDK6VINzb857zE/edit#gid=1232097690