Page MenuHomePhabricator

Switch maps metrics from hourly to daily
Closed, DeclinedPublic

Description

With all the stuff to be tracked currently waiting to be implemented in [[interactive-sprint]], hourly crons have a real chance to eventually take longer than hour to complete. For the reference, just geo-tag-counts.php is ~12min.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Isn't it due to the synchronous nature of your php script? Python, or even better - node would be able to handle it much better. Or we could do it via a shell script - the php code would only deal with one DB, and we will run it with the parallel command. We would still need to sync up query time

Nope, by melting the DB server with multiple slow requests in parallel, you won't make it faster. As a general rule, there should be 1 slow query at a time.

Aren't they in different db clusters? In any case, you are right, we should start thinking about a better way to do this. One option is to use category tracking because MediaWiki already keeps a count of pages in a category as a lookup value.

In analytics/discovery-stats.

debt subscribed.

Moving off the sprint board - the Discovery team won't be able to finish this work at this time.

Mholloway added a project: Analytics.
Mholloway added subscribers: Gehel, Mholloway.

Per convo with @Gehel, it sounds like this concerns a script used to track usage of the Kartographer-specific tags across all projects. Maybe there is a more standard way of doing this? Tagging Analytics for comment.

For the current script, it does sound like daily is plenty.