Page MenuHomePhabricator

Decrease the request from iOS app to bohrium
Closed, ResolvedPublic

Description

We saw bohrium fail to archive data for the iOS piwik dashboard again recently (March 12 and 13). @elukey said he will invalidate 2018-03-12/13 for iOS data in piwik to force a re-run of the archiver.

Meanwhile, we've all been aware that the volume of request from iOS app is a long-standing issue (T123640#2121263). According to @elukey on IRC today,

the infra on which bohrium/piwik runs now is more stable (the sre team fixed the underlying issue)

Can it handle the volume from iOS app now?

The iOS team also wants to help to alleviate the situation. Here are some thoughts:

  • We can increase the dispatch interval from 60 seconds to a higher number (e.g. 120 seconds) so that the tracker would dispatch events less frequently.
  • Instead of archiving the data on the fly, we can schedule the cron task once a day or even less frequently, since we normally don't need the real-time data.
  • The data has already been sampled at the event level in a 1:10 rate. This has impacted our ability to get insight of app user behavior from smaller wikis. We prefer not to downsample even more.

Event Timeline

fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

We can increase the dispatch interval from 60 seconds to a higher number (e.g. 120 seconds) so that the tracker would dispatch events less frequently.

Let's please do this.

Instead of archiving the data on the fly, we can schedule the cron task once a day or even less frequently, since we normally don't need the real-time data.

This actually makes problem worst as -literally- there is too much data to be processed so as of late we run crons more frequently.

The data has already been sampled at the event level in a 1:10 rate. This has impacted our ability to get insight of app user behavior from smaller wikis. We prefer not to downsample even more.

I understand, now, using piwik for analytics has tradeoffs such as this one, you are limited by volume and that means that you might not be able to extract learnings for not widely represented wikis.

@chelsyx so now the infrastructure that runs the bohrium host (and hence piwik) is much more stable, we hope to have solved the issues that were causing the host to frequently freeze and not archive data. If I have understood it correctly, the last remaining step is on your side to work on a wider dispatch interval; is my understanding correct? Are there pending actions for Analytics?

Per our latest conversation a wider dispatch interval will not change the amount of events sent, thus on our end we are in the same place, what clogs piwik is the amount of data, so short of changing sampling rates I do not think any other measure will be useful to reduce data loss.

Thanks @elukey for the information about bohrium infrastructure!
As @Nuria said, a wider dispatch interval won't solve the problem. I think we can close this ticket now, as the iOS team has started to work on eventlogging instrumentation. After we finish, we will stop using piwik.

Milimetric claimed this task.

Since this is being worked on, I'm resolving it from our end, and we'll re-open if there's any problem with the switch to EventLogging.