Page MenuHomePhabricator

As a data analyst, I want to be able to calculate retention using a standard EL event
Closed, ResolvedPublic2 Estimated Story Points

Description

Add EventLogger ping for days installed.

We need to start sending this event once per day on app foregrounding. This will allow us to calculate retention in our own data volunteers and not be reliant on Apple's metric.

This is the event logging schema we should be populating:
https://meta.wikimedia.org/wiki/Schema:MobileWikiAppDailyStats

Event Timeline

JMinor raised the priority of this task from to Medium.
JMinor updated the task description. (Show Details)
JMinor added subscribers: JMinor, Tbayer.
Fjalapeno set Security to None.
Fjalapeno edited a custom field.
Fjalapeno raised the priority of this task from Medium to Needs Triage.Feb 18 2016, 4:34 PM
BGerstle-WMF subscribed.

@JMinor skipping PM signoff so you can verify logs are sent from latest beta

@Tbayer @JKatzWMF can you help me verify this data is now populating? How can I check?

As for "is this thing on": yes, it seems so, since Feb 18:

mysql:research@analytics-store.eqiad.wmnet [log]> SELECT LEFT(timestamp, 8) AS date, COUNT(*) FROM log.MobileWikiAppDailyStats_12637385 WHERE userAgent LIKE '%iPhone%' AND LEFT(timestamp, 6) = '201602' GROUP BY date ORDER BY date;
+----------+----------+
| date     | COUNT(*) |
+----------+----------+
| 20160218 |       23 |
| 20160219 |      137 |
| 20160220 |      135 |
| 20160221 |      126 |
| 20160222 |      111 |
+----------+----------+

And just to check for myself that that condition ("userAgent LIKE '%iPhone%' ") should catch all iOS app events:

mysql:research@analytics-store.eqiad.wmnet [log]> SELECT DISTINCT userAgent FROM log.MobileWikiAppDailyStats_12637385 WHERE LEFT(timestamp, 6) = '201602' AND userAgent NOT LIKE '%Android%' AND userAgent NOT LIKE '%Googlebot%';
+----------------------------------------------------+
| userAgent                                          |
+----------------------------------------------------+
| "WikipediaApp/5.0.0.714 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.3; Phone)"    |
| "WikipediaApp/5.0.0.565 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.716 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.0.2; Phone)"  |
| "WikipediaApp/5.0.0.708 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.714 (iPhone OS 8.2; Tablet)"   |
| "WikipediaApp/5.0.0.706 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.710 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.3; Tablet)"   |
| "WikipediaApp/5.0.0.565 (iPhone OS 8.4; Phone)"    |
| "WikipediaApp/5.0.0.565 (iPhone OS 9.2; Phone)"    |
| "WikipediaApp/5.0.0.715 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.2; Phone)"    |
| "WikipediaApp/5.0.0.707 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.710 (iPhone OS 8.2; Tablet)"   |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.2.1; Tablet)" |
| "WikipediaApp/5.0.0.714 (iPhone OS 9.0.1; Phone)"  |
| "WikipediaApp/5.0.0.717 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.712 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.565 (iPhone OS 9.1; Phone)"    |
| "WikipediaApp/5.0.0.718 (iPhone OS 9.2.1; Phone)"  |
| "WikipediaApp/5.0.0.711 (iPhone OS 9.2.1; Phone)"  |
+----------------------------------------------------+
23 rows in set (6.39 sec)

And here is the exact query used to generate the analogous stats for Android (from the Readership metrics report):

SELECT LEFT(timestamp, 8) AS date, SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS
day0_active, SUM(IF(event_appInstallAgeDays = 7, 1, 0)) AS day7_active FROM
log.MobileWikiAppDailyStats_12637385 WHERE userAgent LIKE '%­r­%' AND userAgent NOT
LIKE '%Googlebot%' GROUP BY date ORDER BY DATE;

(with the retention rate calculated as day7_active divided by day0_active from seven days earlier, of course)

Provided that event_appInstallAgeDays has been implemented in the same way for iOS (i.e. matches the schema description), we can simply replace the userAgent condition with "userAgent LIKE '%iPhone% " as above and are done.

Thanks @Tbayer. Looks like we can resolve this one.

....

Provided that event_appInstallAgeDays has been implemented in the same way for iOS (i.e. matches the schema description), we can simply replace the userAgent condition with "userAgent LIKE '%iPhone% " as above and are done.

Hmm, it looks like we should have checked the "provided that" part too :(

First, it turns out that for the iOS app the table is recording negative values (or 0) for appInstallAgeDays, which is defined in the schema as "The number of days since the app was first installed" (the Android values are always greater than or equal to zero).
Fine, I assumed instead that appInstallAgeDays = -7 means that the app was installed 7 days before the timestamp date. But even when modified like that, the above reference query yields results that don't make sense. E.g. on March 13 there were 12972 events with install ago 0 (i.e. installed on that day), but on March 20, the number of devices which told us that they were active on that day and had been installed 7 days before was 17739. That's a day-7 retention rate of more than 100% (137% to be exact), which is obviously impossible ;)

Perhaps the definition of install age needs to be clarified here, or something went wrong with the requirement that the app sends the event only once per day, or ...

@Fjalapeno or @BGerstle-WMF, do you have any insights on what's going on?

mysql:research@s1-analytics-slave.eqiad.wmnet [(none)]> SELECT LEFT(timestamp, 8) AS date, SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active, SUM(IF(event_appInstallAgeDays = -7, 1, 0)) AS day7_active FROM log.MobileWikiAppDailyStats_12637385 WHERE userAgent LIKE '%iPhone%' AND userAgent NOT LIKE '%Googlebot%' GROUP BY date ORDER BY DATE;
+----------+-------------+-------------+
| date     | day0_active | day7_active |
+----------+-------------+-------------+
| 20160218 |          23 |           0 |
| 20160219 |          88 |           0 |
| 20160220 |         113 |           0 |
| 20160221 |          58 |           0 |
| 20160222 |          54 |           0 |
| 20160223 |          74 |           0 |
| 20160224 |          24 |           0 |
| 20160225 |          30 |           0 |
| 20160226 |          23 |           1 |
| 20160227 |          22 |         104 |
| 20160228 |          11 |          40 |
| 20160229 |           6 |          19 |
| 20160301 |          10 |          28 |
| 20160302 |          20 |           4 |
| 20160303 |          36 |          14 |
| 20160304 |          19 |           1 |
| 20160305 |           7 |           7 |
| 20160306 |          13 |           4 |
| 20160307 |           3 |          14 |
| 20160308 |           2 |          20 |
| 20160309 |          13 |          12 |
| 20160310 |         866 |          30 |
| 20160311 |       14928 |          16 |
| 20160312 |       19095 |          13 |
| 20160313 |       12972 |          14 |
| 20160314 |        6578 |           6 |
| 20160315 |        4395 |           1 |
| 20160316 |        3277 |           8 |
| 20160317 |        2772 |        3018 |
| 20160318 |        2430 |       24547 |
| 20160319 |        2225 |       26174 |
| 20160320 |        2517 |       17739 |
| 20160321 |        1849 |        8814 |
| 20160322 |        1370 |        5199 |
| 20160323 |        1519 |        3912 |
| 20160324 |        1354 |        3167 |
| 20160325 |        1513 |        2895 |
| 20160326 |        1462 |        3136 |
| 20160327 |        1563 |        3103 |
| 20160328 |        1338 |        2367 |
| 20160329 |        1091 |        1691 |
| 20160330 |        1083 |        1685 |
| 20160331 |        1013 |        1481 |
| 20160401 |        1048 |        1310 |
| 20160402 |        1098 |        1444 |
| 20160403 |        1157 |        1508 |
| 20160404 |         777 |        1031 |
+----------+-------------+-------------+
47 rows in set (14.56 sec)
JMinor triaged this task as High priority.

Specifically lets:

  • Numbers should be positive "days since install"
  • Investigate why we have a 100% retention rate on the date @Tbayer highlighted

Just checked the source.

So 2 different issues:

  1. It is sending negative numbers instead. And it is how you expect… -7 means seven days since install.
  1. As far as why some data seems non-sensical… I'm not sure. It appears that if there is a nil date, then it bails. otherwise it sends an integer for days since install. I pasted the code in below:
- (void)logAppNumberOfDaysSinceInstall {
    if (![self shouldLogInstallDays]) {
        return;
    }

    NSDate* date = [[NSUserDefaults standardUserDefaults] wmf_appInstallDate];
    NSParameterAssert(date);
    if (!date) {
        return;
    }

    NSInteger days = [[NSDate date] distanceInDaysToDate:date];
    [self log:@{kAppInstallAgeKey: @(days)}];
    [[NSUserDefaults standardUserDefaults] wmf_setDateLastDailyLoggingStatsSent:date];
}

The only thing that could be wrong is the app install date, but that seems ok too. This is the code that sets it. it only allows it to be set once:

public func wmf_setAppInstallDateIfNil(date: NSDate) {
    let previous = self.wmf_appInstallDate()
    
    if previous == nil {
        self.setObject(date, forKey: "AppInstallDate")
        self.synchronize()
    }
}

one last thing to look at is distanceInDaysToDate. This is straight Obj-C - not sure if anyone else can follow. But looking at it, it seems straight forward as well. It sues system components to make the calculation.

- (NSInteger)distanceInDaysToDate:(NSDate*)anotherDate {
    NSCalendar* gregorianCalendar = [[NSCalendar alloc] initWithCalendarIdentifier:NSCalendarIdentifierGregorian];
    NSDateComponents* components  = [gregorianCalendar components:NSCalendarUnitDay fromDate:self toDate:anotherDate options:0];
    return components.day;
}

Does anybody notice a glaring issue in the code above? If not, I am not sure why that is wrong.

Could the negative numbers perhaps be throwing the calculation off?

One other possible result is that maybe events aren't being sent all the time? The iOS implementation of event logging has no queuing system. So any events that fail - fail forever - they are never resent. Maybe that accounts for this discrepancy?

One last thought is date calculations are fraught with peril… maybe the distanceFromDate calculation is working from the users time zone? And maybe you are looking at UTC?

Those are my only ideas…

This tasks is only tagged with a release. It should probably be tagged with the appropriate backlog, too.

JMinor changed the task status from Open to Stalled.Apr 14 2016, 5:06 PM
JMinor claimed this task.
JMinor added a subscriber: Fjalapeno.

We still weren't able to find any code-level reason for the "aberrant" install durations.

I'm going to keep this card open, but put it into the next release board and stall it for now.

@JMinor what do we want to do with this? More testing?

JMinor changed the task status from Stalled to Open.Jun 21 2016, 9:11 PM

Yeah, I wanted to retest the data using @Tbayer 's query and check that negatives were no longer there, and see if we continue to have cohorts with higher than 100% retention.

Moving forward to PM signoff, but leaving open and assigned to me until I can run this query and check results.

JMinor lowered the priority of this task from High to Medium.Jul 28 2016, 12:01 AM
JMinor removed a project: iOS-app-v5.0.5-Fever.

@Tbayer I just chatted with @JMinor about this. Can you jump in and check the quality of the data now that the change was made? In particular, it would be good to quantify or qualify the extent of the issue. Is it a 1% impact? Is there an obvious pattern to the issue?

@Tbayer I just chatted with @JMinor about this. Can you jump in and check the quality of the data now that the change was made?

OK, did one quick check (see below). BTW what change has been made? Did we alter any of the code parts that @Fjalapeno identified above (T126693#2182063)?

In particular, it would be good to quantify or qualify the extent of the issue. Is it a 1% impact? Is there an obvious pattern to the issue?

Without launching a fuller investigation, I re-ran the above query from April, and the issue is still present in the same form. E.g. on July 31 the schema registered 545 activations (activity on day 0), and on August 7 it saw 724 day-7 events, corresponding to a day-7 retention of 133%.

SELECT LEFT(timestamp, 8) AS date, 
SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active, 
SUM(IF(event_appInstallAgeDays = -7, 1, 0)) AS day7_active 
FROM log.MobileWikiAppDailyStats_12637385 
WHERE userAgent LIKE '%iPhone%' AND userAgent NOT LIKE '%Googlebot%' 
GROUP BY date ORDER BY DATE;

+----------+-------------+-------------+
| date     | day0_active | day7_active |
+----------+-------------+-------------+
| 20160218 |          23 |           0 |
| 20160219 |          88 |           0 |
| 20160220 |         113 |           0 |
| 20160221 |          58 |           0 |
| 20160222 |          54 |           0 |
| 20160223 |          74 |           0 |
| 20160224 |          24 |           0 |
| 20160225 |          30 |           0 |
| 20160226 |          23 |           1 |
| 20160227 |          22 |         104 |
| 20160228 |          11 |          40 |
| 20160229 |           6 |          19 |
| 20160301 |          10 |          28 |
| 20160302 |          20 |           4 |
| 20160303 |          36 |          14 |
| 20160304 |          19 |           1 |
| 20160305 |           7 |           7 |
| 20160306 |          13 |           4 |
| 20160307 |           3 |          14 |
| 20160308 |           2 |          20 |
| 20160309 |          13 |          12 |
| 20160310 |         866 |          30 |
| 20160311 |       14928 |          16 |
| 20160312 |       19095 |          13 |
| 20160313 |       12972 |          14 |
| 20160314 |        6578 |           6 |
| 20160315 |        4395 |           1 |
| 20160316 |        3277 |           8 |
| 20160317 |        2772 |        3018 |
| 20160318 |        2430 |       24547 |
| 20160319 |        2225 |       26174 |
| 20160320 |        2517 |       17739 |
| 20160321 |        1849 |        8814 |
| 20160322 |        1370 |        5199 |
| 20160323 |        1519 |        3912 |
| 20160324 |        1354 |        3167 |
| 20160325 |        1513 |        2895 |
| 20160326 |        1462 |        3136 |
| 20160327 |        1563 |        3103 |
| 20160328 |        1338 |        2367 |
| 20160329 |        1091 |        1691 |
| 20160330 |        1083 |        1685 |
| 20160331 |        1013 |        1481 |
| 20160401 |        1048 |        1310 |
| 20160402 |        1101 |        1444 |
| 20160403 |        1158 |        1508 |
| 20160404 |         857 |        1160 |
| 20160405 |         908 |        1078 |
| 20160406 |         775 |        1120 |
| 20160407 |         792 |        1109 |
| 20160408 |         692 |        1015 |
| 20160409 |         653 |        1106 |
| 20160410 |         969 |        1159 |
| 20160411 |         755 |         837 |
| 20160412 |         663 |         831 |
| 20160413 |         684 |         790 |
| 20160414 |         677 |         621 |
| 20160415 |         544 |         671 |
| 20160416 |         700 |         730 |
| 20160417 |         724 |         929 |
| 20160418 |         673 |         759 |
| 20160419 |         503 |         585 |
| 20160420 |         617 |         517 |
| 20160421 |         593 |         465 |
| 20160422 |         723 |         563 |
| 20160423 |         635 |         583 |
| 20160424 |         740 |         866 |
| 20160425 |         551 |         665 |
| 20160426 |         522 |         490 |
| 20160427 |         613 |         515 |
| 20160428 |         635 |         483 |
| 20160429 |         537 |         532 |
| 20160430 |         538 |         526 |
| 20160501 |         748 |         831 |
| 20160502 |         640 |         705 |
| 20160503 |         626 |         589 |
| 20160504 |         620 |         516 |
| 20160505 |         691 |         572 |
| 20160506 |         523 |         460 |
| 20160507 |         541 |         534 |
| 20160508 |         622 |         765 |
| 20160509 |         608 |         506 |
| 20160510 |         565 |         470 |
| 20160511 |         558 |         541 |
| 20160512 |         561 |         626 |
| 20160513 |         600 |         553 |
| 20160514 |         512 |         572 |
| 20160515 |         719 |         894 |
| 20160516 |         706 |         633 |
| 20160517 |         610 |         580 |
| 20160518 |         541 |         461 |
| 20160519 |         573 |         524 |
| 20160520 |         496 |         422 |
| 20160521 |         540 |         468 |
| 20160522 |         560 |         659 |
| 20160523 |         688 |         659 |
| 20160524 |         605 |         515 |
| 20160525 |         575 |         578 |
| 20160526 |         546 |         560 |
| 20160527 |         529 |         355 |
| 20160528 |         526 |         430 |
| 20160529 |         457 |         817 |
| 20160530 |         604 |         482 |
| 20160531 |         594 |         388 |
| 20160601 |         511 |         394 |
| 20160602 |         562 |         407 |
| 20160603 |         423 |         479 |
| 20160604 |         411 |         608 |
| 20160605 |         583 |         558 |
| 20160606 |         520 |         476 |
| 20160607 |         486 |         384 |
| 20160608 |         582 |         416 |
| 20160609 |         450 |         395 |
| 20160610 |         433 |         362 |
| 20160611 |         472 |         528 |
| 20160612 |         542 |         607 |
| 20160613 |         504 |         475 |
| 20160614 |         459 |         430 |
| 20160615 |         547 |         395 |
| 20160616 |         465 |         398 |
| 20160617 |         374 |         273 |
| 20160618 |         513 |         457 |
| 20160619 |         567 |         584 |
| 20160620 |         414 |         396 |
| 20160621 |         465 |         405 |
| 20160622 |         488 |         388 |
| 20160623 |         519 |         406 |
| 20160624 |         400 |         415 |
| 20160625 |         581 |         385 |
| 20160626 |         655 |         505 |
| 20160627 |         541 |         392 |
| 20160628 |         525 |         425 |
| 20160629 |         517 |         378 |
| 20160630 |         452 |         425 |
| 20160701 |         462 |         384 |
| 20160702 |         470 |         364 |
| 20160703 |         515 |         564 |
| 20160704 |         528 |         405 |
| 20160705 |         556 |         453 |
| 20160706 |         520 |         425 |
| 20160707 |         510 |         402 |
| 20160708 |         478 |         376 |
| 20160709 |         436 |         435 |
| 20160710 |         449 |         481 |
| 20160711 |         519 |         379 |
| 20160712 |         387 |         455 |
| 20160713 |         451 |         359 |
| 20160714 |         367 |         326 |
| 20160715 |         481 |         351 |
| 20160716 |         469 |         361 |
| 20160717 |         484 |         556 |
| 20160718 |         510 |         545 |
| 20160719 |         423 |         373 |
| 20160720 |         469 |         452 |
| 20160721 |         428 |         283 |
| 20160722 |         421 |         435 |
| 20160723 |         449 |         357 |
| 20160724 |         446 |         469 |
| 20160725 |         444 |         369 |
| 20160726 |         438 |         390 |
| 20160727 |         512 |         349 |
| 20160728 |         598 |         392 |
| 20160729 |         617 |         425 |
| 20160730 |         565 |         414 |
| 20160731 |         545 |         551 |
| 20160801 |         487 |         428 |
| 20160802 |         524 |         361 |
| 20160803 |         520 |         448 |
| 20160804 |         390 |         396 |
| 20160805 |         445 |         571 |
| 20160806 |         479 |         488 |
| 20160807 |         428 |         724 |
| 20160808 |         443 |         477 |
| 20160809 |         540 |         491 |
| 20160810 |         506 |         371 |
| 20160811 |         582 |         384 |
| 20160812 |         478 |         400 |
| 20160813 |         507 |         383 |
| 20160814 |         552 |         530 |
| 20160815 |         559 |         435 |
| 20160816 |         427 |         535 |
| 20160817 |         496 |         411 |
| 20160818 |         266 |         175 |
+----------+-------------+-------------+
183 rows in set (28.30 sec)

@Tbayer the code changes to fix negative numbers was released in 5.0.5. The code @Fjalapeno posted is the code in the current app, and we confirmed it looks logically correct and there is no obvious cause to this.

Obviously there is still an issue, but it will require much more technical investigation to discover.

Hi @JMinor, now I'm a bit confused.

@Tbayer the code changes to fix negative numbers was released in 5.0.5.

Actually the app is still sending negative numbers as before. I just double-checked and even after the release of 5.0.5 (July 28) there has been no event with event_appInstallAgeDays = 7 (see below). As mentioned before, this is a minor issue, mainly about compatibility with Android data in the same schema, and it can be compensated for (in the query posted earlier today, I continued to use event_appInstallAgeDays = -7 instead). But it seems we are not entirely clear about which code is live.

The code @Fjalapeno posted is the code in the current app,

@Fjalapeno posted the code one day after this bug was brought up. So I take it that there were actually no changes since then.

and we confirmed it looks logically correct and there is no obvious cause to this.
Obviously there is still an issue, but it will require much more technical investigation to discover.

OK, so it seems that this is a hard nut to crack and that we are basically still where we were in April. Let me know if I can help in any way with the investigation on the data analysis side.

SELECT LEFT(timestamp, 8) AS date, 
SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active, 
SUM(IF(event_appInstallAgeDays = 7, 1, 0)) AS day7_active 
FROM log.MobileWikiAppDailyStats_12637385 
WHERE userAgent LIKE '%iPhone%' AND userAgent NOT LIKE '%Googlebot%' 
AND LEFT(timestamp, 8) >= '201607'
GROUP BY date ORDER BY DATE;
+----------+-------------+-------------+
| date     | day0_active | day7_active |
+----------+-------------+-------------+
| 20160701 |         462 |           0 |
| 20160702 |         470 |           0 |
| 20160703 |         515 |           0 |
| 20160704 |         528 |           0 |
| 20160705 |         556 |           0 |
| 20160706 |         520 |           0 |
| 20160707 |         510 |           0 |
| 20160708 |         478 |           0 |
| 20160709 |         436 |           0 |
| 20160710 |         449 |           0 |
| 20160711 |         519 |           0 |
| 20160712 |         387 |           0 |
| 20160713 |         451 |           0 |
| 20160714 |         367 |           0 |
| 20160715 |         481 |           0 |
| 20160716 |         469 |           0 |
| 20160717 |         484 |           0 |
| 20160718 |         510 |           0 |
| 20160719 |         423 |           0 |
| 20160720 |         469 |           0 |
| 20160721 |         428 |           0 |
| 20160722 |         421 |           0 |
| 20160723 |         449 |           0 |
| 20160724 |         446 |           0 |
| 20160725 |         444 |           0 |
| 20160726 |         438 |           0 |
| 20160727 |         512 |           0 |
| 20160728 |         598 |           0 |
| 20160729 |         617 |           0 |
| 20160730 |         565 |           0 |
| 20160731 |         545 |           0 |
| 20160801 |         487 |           0 |
| 20160802 |         524 |           0 |
| 20160803 |         520 |           0 |
| 20160804 |         390 |           0 |
| 20160805 |         445 |           0 |
| 20160806 |         479 |           0 |
| 20160807 |         428 |           0 |
| 20160808 |         443 |           0 |
| 20160809 |         540 |           0 |
| 20160810 |         506 |           0 |
| 20160811 |         582 |           0 |
| 20160812 |         478 |           0 |
| 20160813 |         507 |           0 |
| 20160814 |         552 |           0 |
| 20160815 |         559 |           0 |
| 20160816 |         427 |           0 |
| 20160817 |         496 |           0 |
| 20160818 |         338 |           0 |
+----------+-------------+-------------+
49 rows in set (1 min 2.13 sec)

Just checked the source.

...

  1. As far as why some data seems non-sensical… I'm not sure. It appears that if there is a nil date, then it bails. otherwise it sends an integer for days since install. I pasted the code in below:
- (void)logAppNumberOfDaysSinceInstall {
    if (![self shouldLogInstallDays]) {
        return;
    }

    NSDate* date = [[NSUserDefaults standardUserDefaults] wmf_appInstallDate];
    NSParameterAssert(date);
    if (!date) {
        return;
    }

    NSInteger days = [[NSDate date] distanceInDaysToDate:date];
    [self log:@{kAppInstallAgeKey: @(days)}];
    [[NSUserDefaults standardUserDefaults] wmf_setDateLastDailyLoggingStatsSent:date];
}

The only thing that could be wrong is the app install date, but that seems ok too. This is the code that sets it. it only allows it to be set once:

public func wmf_setAppInstallDateIfNil(date: NSDate) {
    let previous = self.wmf_appInstallDate()
    
    if previous == nil {
        self.setObject(date, forKey: "AppInstallDate")
        self.synchronize()
    }
}

one last thing to look at is distanceInDaysToDate. This is straight Obj-C - not sure if anyone else can follow. But looking at it, it seems straight forward as well. It sues system components to make the calculation.

- (NSInteger)distanceInDaysToDate:(NSDate*)anotherDate {
    NSCalendar* gregorianCalendar = [[NSCalendar alloc] initWithCalendarIdentifier:NSCalendarIdentifierGregorian];
    NSDateComponents* components  = [gregorianCalendar components:NSCalendarUnitDay fromDate:self toDate:anotherDate options:0];
    return components.day;
}

Does anybody notice a glaring issue in the code above? If not, I am not sure why that is wrong.

Could the negative numbers perhaps be throwing the calculation off?

No, I don't see how.

One other possible result is that maybe events aren't being sent all the time? The iOS implementation of event logging has no queuing system. So any events that fail - fail forever - they are never resent. Maybe that accounts for this discrepancy?

But in case this was an issue, would there be any reason to assume that it affects the day0 and day7 events differently?
(The >100% day7 retention problem indicates that either day0 events are failing to be sent much more frequently than day7 events fail, or that there are extraneous day7 events being sent. One possible next step might be to check if there are duplicates among the day7 events, I could do that sometime later once I have a bit more time.)

https://github.com/wikimedia/wikipedia-ios/pull/945

The app was using the install date instead of the last log date as the check for only logging once per day. Thus, on install day, only one event. on any other day, multiple events any time the user opens the app.

I also fixed the negative number issue.

Another potential problem - the app is using different logic to determine when to log than what it uses for day count. It seems it should just check 'what day count did i send last time?' and send only if it's different. Before I fix this, is the desired behavior to count calendar days or absolute time? For example, if the user installs 9PM on Monday, and opens the app on 12PM Tuesday, is that still day 0 or is it day 1?

I updated the PR with the assumption that we're counting calendar days - so 9PM Monday to 12 PM Tuesday is 1 day.

Thanks Joe. Leaving this open for @Tbayer and I to confirm data looks clean once the public beta goes out.

https://github.com/wikimedia/wikipedia-ios/pull/945

The app was using the install date instead of the last log date as the check for only logging once per day. Thus, on install day, only one event. on any other day, multiple events any time the user opens the app.

I also fixed the negative number issue.

Awesome, thanks for finding and fixing these!

Another potential problem - the app is using different logic to determine when to log than what it uses for day count. It seems it should just check 'what day count did i send last time?' and send only if it's different. Before I fix this, is the desired behavior to count calendar days or absolute time? For example, if the user installs 9PM on Monday, and opens the app on 12PM Tuesday, is that still day 0 or is it day 1?

Late to the party here, but I wanted to mention that the Android app (i.e. the existing data in this schema) appears to use absolute time, which seems the slightly more logical choice to me. It's not a big deal though - especially for the day-7 retention that we are currently focusing on, the difference should be negligible.

Now that this has been out in production since November 21 with 5.3.x, here's a first check of the incoming data. It looks plausible so far, and would correspond to 22.3% D 7 retention for installs from November 22, 19.7% for November 23 and 19.9% for November 24.

SELECT LEFT(timestamp, 8) AS date,
SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active,
SUM(IF(event_appInstallAgeDays = 7, 1, 0)) AS day7_active
FROM log.MobileWikiAppDailyStats_12637385
WHERE 
timestamp >= '20161122%' AND
userAgent LIKE '"WikipediaApp/5.3.%' AND
(userAgent LIKE '%iPhone%'
OR userAgent LIKE '%iOS%')
GROUP BY date ORDER BY date;
+----------+-------------+-------------+
| date     | day0_active | day7_active |
+----------+-------------+-------------+
| 20161122 |        1267 |         114 |
| 20161123 |        1213 |         149 |
| 20161124 |        1133 |         154 |
| 20161125 |        1022 |         168 |
| 20161126 |        1349 |         231 |
| 20161127 |        1404 |         293 |
| 20161128 |        1166 |         240 |
| 20161129 |        1082 |         282 |
| 20161130 |        1098 |         239 |
| 20161201 |        1044 |         225 |
| 20161202 |         161 |          41 |
+----------+-------------+-------------+
11 rows in set (1 min 39.15 sec)

(The nonzero day7_active count until November 28 should capture installs that were downloaded as version 5.2.x before or on November 21, and then were updated within a week to 5.3.x.)

To wrap this up here (@JMinor and I already looked at the data two weeks ago and concluded it looks good so far), here's a current chart:

7-day retention of Wikipedia iOS app users (Nov 2016-Feb 2017).png (455×1 px, 36 KB)

Data source:

SELECT LEFT(timestamp, 8) AS date,
SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active,
SUM(IF(event_appInstallAgeDays = 7, 1, 0)) AS day7_active
FROM log.MobileWikiAppDailyStats_12637385
WHERE userAgent LIKE '%iPhone%'
OR userAgent LIKE '%iOS%'
GROUP BY date ORDER BY DATE;