Page MenuHomePhabricator

Provide weekly app session metrics separately for Android and iOS, and move to 7 day counts [13 pts]
Closed, ResolvedPublic

Description

Combining the other task for adding 7 day counts into this one.

This is what we'll do:

Change the spark code to provide for the Android iOS split and generate this new data every 7 days.
Keep running the old job as is without the split for every 30 days.
We may generate the new data in a new file if needed.

Initial thoughts:

As discussed per email, we need platform-specific versions of the app session metrics that are currently being made available on Hive (wmf.mobile_apps_session_metrics) and on Hue.
Context: T86535 (initial task with methodology for calculating the number), T97876#1409884 (implementation details)

Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we should keep generating it as before, and add the platform-specific data separately.
There are various options on how to modify the format of the existing table for that. One possibility would be to add new values for the "type" column, which currently is either "PageviewsPerSession", "SessionLength", or "SessionsPerUser". Like this:

Now:SessionsPerUser
In the future:SessionsPerUser, SessionsPerUser_iOS, SessionsPerUser_Android

Or one could add a new "platform" column with value either "iOS", "Android", or "all" (the first two would be consistent with the unique app users table, the third would tag the rows containing the overall data as calculated currently, and would need to be backfilled in the existing rows.)
Either of these two options would mean that the job will add nine instead of three rows every week.

The data should be backfilled as far as possible, to enable historical comparisons and a better understanding of the rise in median session length over the last half a year.

Event Timeline

Tbayer raised the priority of this task from to Needs Triage.
Tbayer updated the task description. (Show Details)
Tbayer added a project: Analytics.
Tbayer added subscribers: Tbayer, JKatzWMF.

Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we should keep generating it as before, and add the platform-specific data separately.

What is the rationale for this? If you have platform specific data the aggregated numbers do not seem to provide much value,

@Tbayer: FYI, if you have a developer in your team willing to work with us on doing these changes they will get done earlier.

Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we should keep generating it as before, and add the platform-specific data separately.

What is the rationale for this? If you have platform specific data the aggregated numbers do not seem to provide much value,

I absolutely agree, but that's hypothetical as we don't have platform-specific data for these months since May. Or are you saying that it could be generated retroactively?
The point of continuing to record the same data is to enable providing historical trends and comparisons (as I did in case of the median session lengths in last week's readership metrics report - one would need to wait another half a year for that otherwise).

absolutely agree, but that's hypothetical as we don't have platform-specific data for these months since May. Or are you saying that it could be generated retroactively?

It can be generated retroactively for the last couple of months.

absolutely agree, but that's hypothetical as we don't have platform-specific data for these months since May. Or are you saying that it could be generated retroactively?

It can be generated retroactively for the last couple of months.

Thanks, great to know! We should do that for the new platform-specific metrics at least - I have added that to the task description.

Does that go back to May though? (e.g. we reported this data in the Reading team's Q4 quarterly review already, that's one of the comparison points)

hi @Nuria do you need anything more from us to move this forward: prioritize it against your other initiatives and set a rough timeline? I don't think we should use reading engineers on this given that it was written by @madhuvishy .

absolutely agree, but that's hypothetical as we don't have platform-specific data for these months since May. Or are you saying that it could be generated retroactively?

It can be generated retroactively for the last couple of months.

Thanks, great to know! We should do that for the new platform-specific metrics at least - I have added that to the task description.

Does that go back to May though? (e.g. we reported this data in the Reading team's Q4 quarterly review already, that's one of the comparison points)

Madhu just answered this question: We can backfill two months' worth of data, but not more. So I think we should keep generating the data in the existing format as per the task description, until at some point in the future when the new platform-specific metrics will cover a timespan that's long enough for monitoring trends (we should still backfill these too with these two months - I understand from Madhu that also was done in June/July when this job was started).

We can backfill two months' worth of data, but not more.

Please note that months are not calendar months, though.
We can backfill now only month of October (as we only have data as of today likely back to Sep 9th) .

We can backfill two months' worth of data, but not more.

Please note that months are not calendar months, though.
We can backfill now only month of October (as we only have data as of today likely back to Sep 9th) .

The current job isn't about calendar months; it runs weekly covering the past 30 days. As for the backfilling of the new weekly platform-specific data, it is not too important whether that will add 7 or 8 weeks retroactively.

Milimetric lowered the priority of this task from High to Medium.Dec 3 2015, 6:17 PM
Milimetric raised the priority of this task from Medium to High.
madhuvishy renamed this task from Provide weekly app session metrics separately for Android and iOS to Provide weekly app session metrics separately for Android and iOS, and move to 7 day counts..Jan 11 2016, 6:48 PM
madhuvishy updated the task description. (Show Details)
madhuvishy renamed this task from Provide weekly app session metrics separately for Android and iOS, and move to 7 day counts. to Provide weekly app session metrics separately for Android and iOS, and move to 7 day counts [13 pts].Jan 11 2016, 6:53 PM

Change 264292 had a related patch set uploaded (by Mforns):
Divide app session metrics job into global and split

https://gerrit.wikimedia.org/r/264292

Change 264297 had a related patch set uploaded (by Mforns):
Add split-by-os argument to AppSessionMetrics job

https://gerrit.wikimedia.org/r/264297

Rebased both patches after the mobile->text revert.
So, it's ready for CR. Cheers!

Change 264297 merged by Madhuvishy:
Add split-by-os argument to AppSessionMetrics job

https://gerrit.wikimedia.org/r/264297

Change 264292 merged by Madhuvishy:
Divide app session metrics job into global and split

https://gerrit.wikimedia.org/r/264292

Change 267996 had a related patch set uploaded (by Mforns):
Correct app session metrics README file

https://gerrit.wikimedia.org/r/267996

Change 267996 merged by Ottomata:
Correct app session metrics README and jar version

https://gerrit.wikimedia.org/r/267996

Epilogue: It occurred to me that (besides this Phabricator task and the published code) this table was never publicly documented. I have started a page here, feel free to edit: https://wikitech.wikimedia.org/wiki/Analytics/Data/mobile_apps_session_metrics