Page MenuHomePhabricator

Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk}
Closed, ResolvedPublic

Description

Mobile has a request for an ongoing reporting structure of session length, session counts per users, and events-per-session for the mobile app.

The methodology:

  1. Extract the UUID and timestamp from all pageviews within an N-day period for a sample of App UUIDs
  2. Convert the timestamp to a numeric value.
  3. For each user, sort the converted timestamps and sessionise, ending a session whenever there is a > 1800-second gap between two of a user's pageviews.
  4. Session counts: the number of sessions per user.
  5. Session length: for each session, the time between the first and last event. If there is only one event in a session, it should not be reported.
  6. Events-per-session: the number of pageviews in a session.

For each metric, the geometric mean, arithmetic mean, minima, maxima and quantiles should be generated. A must-have is a way to provide this to the mobile apps team; a nice-to-have is a public report for transparency purposes. I'm around if people want example C++ implementations or to ask questions.

Event Timeline

Ironholds raised the priority of this task from to Needs Triage.
Ironholds updated the task description. (Show Details)
Ironholds added a subscriber: Ironholds.
kevinator triaged this task as Normal priority.Jan 12 2015, 3:44 PM
kevinator set Security to None.
Nuria added a subscriber: Nuria.Jan 27 2015, 7:43 PM

Extract the UUID and timestamp from all pageviews within an N-day period for a sample of App UUIDs

What is the timeperiod exactly?

What is the frecuency of this report? daily? weeekly? (seems than more than weekly it will be too much)

Good q: Deskana?

The high level requirement is that the data be comparable with the old ad-hoc reports. So whatever time period was used for the ad-hoc reports, it should be reused for the automated reports. Was it a backwards looking 30 day window?

In term of frequency of the reports, weekly is fine. Monthly would be too infrequent. Daily would be nice to have, but is by no means necessary.

kevinator raised the priority of this task from Normal to High.Mar 11 2015, 8:45 PM
Nuria added a comment.Mar 12 2015, 4:00 PM

Daily seems that for a global user base wouldn't work so well, anything lower than weekly is not likely to be very meaningful (Oliver correct me if I am wrong)

Nuria added a subscriber: DarTar.Mar 12 2015, 4:13 PM

@DarTar: Could you elaborate a bit on why do we need a geometric mean for this data, what does it represent? Thanks.

kevinator renamed this task from Mobile product managers should have reports on session-related metrics from the Wikipedia Apps to Mobile PMs has reports on session-related metrics from Wikipedia Apps.Mar 12 2015, 5:19 PM

I spoke to Ironholds on IRC:

Run the report weekly on a 30 day window
30 days looking backward is what the previous ad hoc reports used

Geometric Mean is required

for example you get 10 400-second sessions and 1 40,000-second
mean is 4,000 except nobody, zero people, had a mean session length of 4,000
a mean points us to a place on the density curve where nobody lives.
The distribution of session length and several other variables is, at best, log-normal.
In this example the geometric mean is ~600 seconds
mforns claimed this task.Mar 19 2015, 2:15 PM

BTW the reports Oliver generated are here:
http://datasets.wikimedia.org/aggregate-datasets/apps/

The new automated report will append data to these files.

@Deskana we're assuming you don't need this data backfilled. We couldn't anyway, the cluster only has 60 days of rolling data and none of that has the fields we need yet (see blocking tasks).

BTW the reports Oliver generated are here:
http://datasets.wikimedia.org/aggregate-datasets/apps/
The new automated report will append data to these files.

That's fine, but you can be sure to put some kind of visual indicator when the data stops being generated by the new method and starts being generated by the new method, so that we don't compare incomparable numbers? :-)

BTW the reports Oliver generated are here:
http://datasets.wikimedia.org/aggregate-datasets/apps/
The new automated report will append data to these files.

And does this mean that the uniques counting will also start to be appended to these files? The current setup is very suboptimal, I have to ssh into stat1002 and try to download the files. I am not good with computer. :-)

Nuria added a comment.EditedMar 23 2015, 11:23 PM

And does this mean that the uniques counting will also start to be appended to these files? The current setup is very suboptimal, I have to ssh into stat1002 and try to download the files. I am not good with computer. :-)

Not for now, sorry. And .....I will confirm with kevin cause I am not sure was not our plan to append to these files either for session data.

Old uniques data should be deleted as we know in some cases is 20% incorrect.

Old uniques data should be deleted as we know in some cases is 20% incorrect.

Please do not do that. The mobile apps team relies on this data for its quarterly review.

Please do not do that. The mobile apps team relies on this data for its quarterly review.

Got it. Please be aware of the precision of one dataset and the other one.

Please do not do that. The mobile apps team relies on this data for its quarterly review.

Got it. Please be aware of the precision of one dataset and the other one.

Absolutely! I'll be sure to point out that the error makes the numbers generated by the two methodologies not be directly comparable. Thanks.

Change 199935 had a related patch set uploaded (by Mforns):
[WIP] Add Apps session metrics job

https://gerrit.wikimedia.org/r/199935

Nuria claimed this task.Apr 7 2015, 3:45 PM
Nuria added a comment.EditedApr 14 2015, 1:38 AM

I am not sure how the geometric calculation was done before as it seems like it's too big of a number and needs a numerical approximation (it doesn't seem you could multiply sessions lengths for as many sessions as we have and hope you still get a number that can fit in a given type).

We will include quantile calculations for the median but not geometric mean.

Nuria added a comment.Apr 18 2015, 4:28 AM

Ball is on mobile team's court to document uuid field here: https://wikitech.wikimedia.org/wiki/X-Analytics

Is is uuid or wmfuuid?

kevinator renamed this task from Mobile PMs has reports on session-related metrics from Wikipedia Apps to Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk}.May 1 2015, 12:09 AM

Documentation above has been added. it's wmfuuid.

remaining work:

Change 199935 merged by Joal:
Add Apps session metrics job

https://gerrit.wikimedia.org/r/199935

Epilogue: It occurred to me that (besides this Phabricator task and the published code) this table was never publicly documented. I have started a page here, feel free to edit: https://wikitech.wikimedia.org/wiki/Analytics/Data/mobile_apps_session_metrics