These problems are mentioned elsewhere (like posts on the labs list by Sean and Marc-Andre that we have seen). However, since these issues have not been completely resolved, Analytics-Engineering is just making it explicit that it's a problem for tools like Wikimetrics and Vital Signs which we maintain, and tools like Quarry which others maintain. This task is mostly a tracking task for the purpose of Scrum-of-Scrums. The problems include:
- Missing data. We have found data that's in production databases but not in LabsDB, and this does not appear to be fixed after some period of time. Furthermore, if it's fixed, we don't have a way of finding out about it so we can re-generate the affected metrics.
- Slow queries. We are hammering the databases every night for a few hours with the Wikimetrics recurrent reports, so we are happy to help debug if we might be causing a problem. However, since about the end of September, queries that used to be fast are now slow. Especially queries against English Wikipedia, but sometimes other queries have problems.
UPDATE: we are going to most likely work around these problems by moving some of our querying to the production database for now. I'll keep this around as it will be an issue again at some point.