Combining the other task for adding 7 day counts into this one.
This is what we'll do:
Change the spark code to provide for the Android iOS split and generate this new data every 7 days.
Keep running the old job as is without the split for every 30 days.
We may generate the new data in a new file if needed.
Initial thoughts:
As discussed per email, we need platform-specific versions of the app session metrics that are currently being made available on Hive (wmf.mobile_apps_session_metrics) and on Hue.
Context: T86535 (initial task with methodology for calculating the number), T97876#1409884 (implementation details)
Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we should keep generating it as before, and add the platform-specific data separately.
There are various options on how to modify the format of the existing table for that. One possibility would be to add new values for the "type" column, which currently is either "PageviewsPerSession", "SessionLength", or "SessionsPerUser". Like this:
Now: | SessionsPerUser |
In the future: | SessionsPerUser, SessionsPerUser_iOS, SessionsPerUser_Android |
Or one could add a new "platform" column with value either "iOS", "Android", or "all" (the first two would be consistent with the unique app users table, the third would tag the rows containing the overall data as calculated currently, and would need to be backfilled in the existing rows.)
Either of these two options would mean that the job will add nine instead of three rows every week.
The data should be backfilled as far as possible, to enable historical comparisons and a better understanding of the rise in median session length over the last half a year.