Different results with queries in labs versus production around september 21st.
The following query returns different (very different) results in production that it does in labs:
Production:
SELECT count(log_user)
FROM enwiki.logging
/* exclude proxy registrations */
WHERE log_type = 'newusers'
/* only include self-created users, exclude attached and proxy-registered users */
AND log_action = 'create'
AND log_timestamp BETWEEN 20140921000000 AND 20140922000000;Returns: 8027
Labs:
SELECT count(log_user)
FROM logging
/* exclude proxy registrations */
WHERE log_type = 'newusers'
/* only include self-created users, exclude attached and proxy-registered users */
AND log_action = 'create'
AND log_timestamp BETWEEN 20140921000000 AND 20140922000000;Returns: 6842
Halfak did some digging and placed the missing rows in
analytics-store:staging.missing_labs_new_user_20140921
It looks like there's 5 hours of the day where the rows were missing.
mysql:research@analytics-store.eqiad.wmnet [staging]> select LEFT(log_timestamp, 10) as hour, count(*) from missing_labs_new_user_20140921 GROUP BY 1;
+------------+----------+
| hour | count(*) |
+------------+----------+
| 2014092108 | 83 |
| 2014092109 | 336 |
| 2014092110 | 304 |
| 2014092111 | 344 |
| 2014092112 | 118 |
+------------+----------+
5 rows in set (0.01 sec)
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72226