With all the stuff to be tracked currently waiting to be implemented in [[interactive-sprint]], hourly crons have a real chance to eventually take longer than hour to complete. For the reference, just geo-tag-counts.php is ~12min.
Description
Event Timeline
Isn't it due to the synchronous nature of your php script? Python, or even better - node would be able to handle it much better. Or we could do it via a shell script - the php code would only deal with one DB, and we will run it with the parallel command. We would still need to sync up query time
Nope, by melting the DB server with multiple slow requests in parallel, you won't make it faster. As a general rule, there should be 1 slow query at a time.
Aren't they in different db clusters? In any case, you are right, we should start thinking about a better way to do this. One option is to use category tracking because MediaWiki already keeps a count of pages in a category as a lookup value.
Moving off the sprint board - the Discovery team won't be able to finish this work at this time.