Page MenuHomePhabricator

iOS traffic data is not available on Piwik since Feb 20, 2018
Closed, ResolvedPublic5 Estimated Story Points

Description

We don't see any iOS traffic data on Piwik since Feb 20, 2018, although real-time visitor data is still available.

ios_piwik.png (170×533 px, 26 KB)

Event Timeline

Also, some other sites have suspiciously low counts as of the last couple days

There are plenty hits:

root@bohrium:/var/log/apache2# zcat other_vhosts_access.log.2.gz | grep -i iOs | wc -l
1104068

I wonder if the archive cron (which i think is running once a day) has not run due to machine failures for couple times and now it cannot run due to backlog of data. Maybe we need to setup crons per site once a day rather than having 1 cron processing all sites

Executed archiving by hand and got the following for website "3" which is iOS
INFO [2018-03-01 04:37:22] - Will invalidate archived reports for 2018-02-28 for following websites ids: 3 on

Executed the following:

elukey@bohrium:/var/log/piwik$
elukey@bohrium:/var/log/piwik$ for el in {20..28}; do sudo -u www-data /usr/share/piwik/console core:invalidate-report-data --dates=2018-02-$el --sites=3; done
Invalidating day periods in 2018-02-20 [segment = ]...
Invalidating week periods in 2018-02-20 [segment = ]...
Invalidating month periods in 2018-02-20 [segment = ]...
Invalidating year periods in 2018-02-20 [segment = ]...
Invalidating day periods in 2018-02-21 [segment = ]...
Invalidating week periods in 2018-02-21 [segment = ]...
Invalidating month periods in 2018-02-21 [segment = ]...
Invalidating year periods in 2018-02-21 [segment = ]...
Invalidating day periods in 2018-02-22 [segment = ]...
Invalidating week periods in 2018-02-22 [segment = ]...
Invalidating month periods in 2018-02-22 [segment = ]...
Invalidating year periods in 2018-02-22 [segment = ]...
Invalidating day periods in 2018-02-23 [segment = ]...
Invalidating week periods in 2018-02-23 [segment = ]...
Invalidating month periods in 2018-02-23 [segment = ]...
Invalidating year periods in 2018-02-23 [segment = ]...
Invalidating day periods in 2018-02-24 [segment = ]...
Invalidating week periods in 2018-02-24 [segment = ]...
Invalidating month periods in 2018-02-24 [segment = ]...
Invalidating year periods in 2018-02-24 [segment = ]...
Invalidating day periods in 2018-02-25 [segment = ]...
Invalidating week periods in 2018-02-25 [segment = ]...
Invalidating month periods in 2018-02-25 [segment = ]...
Invalidating year periods in 2018-02-25 [segment = ]...
Invalidating day periods in 2018-02-26 [segment = ]...
Invalidating week periods in 2018-02-26 [segment = ]...
Invalidating month periods in 2018-02-26 [segment = ]...
Invalidating year periods in 2018-02-26 [segment = ]...
Invalidating day periods in 2018-02-27 [segment = ]...
Invalidating week periods in 2018-02-27 [segment = ]...
Invalidating month periods in 2018-02-27 [segment = ]...
Invalidating year periods in 2018-02-27 [segment = ]...
Invalidating day periods in 2018-02-28 [segment = ]...
Invalidating week periods in 2018-02-28 [segment = ]...
Invalidating month periods in 2018-02-28 [segment = ]...
Invalidating year periods in 2018-02-28 [segment = ]...

https://matomo.org/faq/how-to/faq_59/

Side note: now Piwik is called Matomo :)

Good news: Nuria was right, the archiver was adding missing data after invalidation.

Bad news: the extra IO (probably) triggered the Ganeti bug that causes the underlying host to freeze (bohrium, on which Piwik runs, is a Ganeti Virtual Machine) so I got only data up to the 22rd:

Screen Shot 2018-03-01 at 9.29.30 AM.png (494×2 px, 78 KB)

Super thanks @elikey. @chelsyx data shoudl be back but please have in mind that piwik is not a very reliable data store, nor does it have teh availability gurantees of eventlogging. It works well for small sites but IOs data is already pushing the limits.

Interesting stat: it took ~5h to archive the iOS data for the past week :D

Archived website id = 3, 4 API requests, Time elapsed: 19351.548s [2/12 done]

Thanks @Nuria and @elukey !

@JMinor, considering the unreliability of piwik, we should figure out a way with the Analytics team to back up the data, in case outage like this happen again.

Data is backed up, now, piwik is a tool mean to be used for low traffiqued sites, otherwise it just cannot handle it. In this case IOS data is getting too large. I think @JMinor is aware of this fact.

@Nuria I was talking about the outage on Nov 23 2017, which the data cannot be recovered. Is there any risk that this kind of outage happen again?

Yes, per @Nuria this is a low priority system. I appreciate that gaps in data are problematic for analysis, but given that we plan to move away from it over this year, I think we're sufficiently backed up for now.

Thanks @elukey and @Nuria for the reboot/resurrection.

I invalidated the rest of the sites and run the archiver so all sites should be good now.

Nuria set the point value for this task to 5.Mar 2 2018, 5:50 AM
Nuria moved this task from In Progress to Done on the Analytics-Kanban board.
Nuria closed this task as Resolved.