Page MenuHomePhabricator

[EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data
Closed, DeclinedPublic

Description

Background

At this time our calculations of number of unique users for iOS and Android are an underestimate. We use tokens to identify app installs and it is only opting in that users send that data. The number of users sending data is very small in iOS so it is likely that our number is far off from the true number of users for the app.

If we were to use a more privacy conscious method to calculate uniques we would not require an opt in and thus our estimate of uniques will be a lot more precise.

Proposal

(Originally by @Nuria) Rather than using appinstallids to calculate uniques, let's use a variation of the last access method (https://blog.wikimedia.org/2016/03/30/unique-devices-dataset/). This would require the iOS and Android apps to send events outside the existing analytics funnels. After speaking to the mobile apps PMs, it sounds like this approach is okay.

How would this work:

  1. When we install the app we store in the device storage the date in which the install happens in a table -or similar- that just has one field: APP_LAST_ACCESS the value of this field is 2018-10-01. No event gets sent (no event is needed at this point since we track installs in the respective app stores).
  1. Time passes and user comes to app for the 2nd time after install. User has not used app for couple of days so it is now October 5th.
  1. When user engages with app the 1st thing app does is to check whether current date is equal to date stored in APP_LAST_ACCESS field, in this case the date is different thus app sends an event with the following fields (note there is no appInstallId or token of any sort).
    • user_agent
    • timestamp, current time 2018-10-05
    • time the app was last used, in this case 2018-10-01
  1. App updates the APP_LAST_ACCESS to 2018-10-05
  1. User continues using the app until app goes to sleep or is closed

When user engages with the app again 3 and 4 are repeated. We'll only want to send the event the first time the app is active, in the foreground on any given day. If the app is "asleep" (in the background) or does something in the background (e.g. reading list sync), that does not count as active usage. We specifically want to know -- on any given day -- how many users actually looked at the app.

On the server side every record for the day with a date different to current date signals a user that engaged with the app that day. The harder engineering problem is to make sure the check-sent-and-update-date-sequence is happening properly.

Event Timeline

Nuria renamed this task from Calculate precisely number of unqiue users for IOS and Android in a privacy Conscious manner that does not require opt in to Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in.Aug 23 2018, 8:41 PM
Nuria renamed this task from Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data.Aug 23 2018, 8:43 PM
Nuria updated the task description. (Show Details)

Thanks @Nuria ! This is going to be very helpful to the iOS app team!

I just noticed the apps have already sent WMF-Last-Access and WMF-Last-Access-Global to the X-Analytics header. Do you know when did that start? Can we use that for this task?

Query:

SELECT x_analytics_map
FROM webrequest
WHERE access_method = 'mobile app'
AND year=2018 and month=9 and day=17 and hour=1
AND webrequest_source IN ('text')
LIMIT 100

Hi @Nuria, @NHarateh_WMF told me that the iOS app can handle cookies -- WMF-Last-Access and WMF-Last-Access-Global in the X-Analytics header are the cookies we get from the domains we access. @mpopov can confirm whether this is the case in Android.

Please let me know if there is anything else you need to move this forward.

The cookie you would need is WMF-Last-Acess-Global and it is been there since "inception", now, you cannot trust that the app is using it like a browser client would. For example, it is not uncommon than any cookies set on a webview opened by the app are deleted once the webview is closed. Also, not uncommon that cookies are completely deleted once an app starts (if stored at all), and last but not least, cookies for a webview and the app might not be shared. All these issues render the method ineffective and that is why I highlighted another methodology that is probably much easier for the app to manage. Now, if we have precise knowledge that app is managing cookies as a browser would (no deletions for non session cookies, sharing for webviews and app, etc) then the last access method with couple small modifications would work.

I am more familiar with this problems on android but I imagine they are pretty similar on IOS, for a very high level explanation of this issues see: https://medium.com/@elye.project/a-tale-on-android-cookies-store-management-b04832ca18c6.

You can see how data is really not sequential as you would expect if cookies for app were centralized, see results of following query. This should render a series per app-install-id in which date of last access is same or higher that prior record. Only one possible request per series should have an older date than the date the request was issued.

select
COALESCE(x_analytics_map['wmfuuid'],parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) AS uuid, x_analytics_map["WMF-Last-Access-Global"] as last_ac
cess, dt, uri_host from webrequest
where year=2018 and day=01 and month=09
AND COALESCE(x_analytics_map['wmfuuid'],parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL
order by uuid,dt, last_access, uri_host limit 1000000

Some examples of cookie management that does not abide to what we would expect (looks like upload is not setting cookies which makes sense)

uuid last_access dt
00000ce8-4583-47e8-ab10-b4b8107c836f NULL 2018-09-01T13:16:56 upload.wikimedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f NULL 2018-09-01T13:16:56 upload.wikimedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f NULL 2018-09-01T13:16:56 upload.wikimedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:56 en.m.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:56 en.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:56 en.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:56 en.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 31-Aug-2018 2018-09-01T13:16:56 en.wikipedia.org -> old date should appear once in this series, regardless of requests being simultaneous
00000ce8-4583-47e8-ab10-b4b8107c836f 31-Aug-2018 2018-09-01T13:16:56 en.wikipedia.org ->old date should appear once in this series, regardless of requests being simultaneous
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:57 en.m.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:58 en.m.wikipedia.org
00000ce8-4583-47e8-ab10-b4b8107c836f 01-Sep-2018 2018-09-01T13:16:59 en.wikipedia.org

00000e5a-98d1-49bc-868f-cf5f2f30c4ff NULL 2018-09-01T03:11:28 meta.wikimedia.org
00000e5a-98d1-49bc-868f-cf5f2f30c4ff 27-Aug-2018 2018-09-01T03:11:28 es.wikipedia.org
00000e5a-98d1-49bc-868f-cf5f2f30c4ff 27-Aug-2018 2018-09-01T03:11:28 es.wikipedia.org -> same cookie than prior request
00000e5a-98d1-49bc-868f-cf5f2f30c4ff NULL 2018-09-01T03:11:29 upload.wikimedia.org
00000e5a-98d1-49bc-868f-cf5f2f30c4ff 01-Sep-2018 2018-09-01T03:11:29 es.m.wikipedia.org

Data indicates that cookies are not centralized on app, I just looked at it briefly, please take a closer look if you may.

If we can get app to centralize cookie management, I think a variation of last access method would work (it would need some tweaks to deal with app data, due to background requests and other app-specific issues). Now, it seems a lot easier for the app to send events in the manner I described above than to centralize cookie management.

JMinor triaged this task as Medium priority.Sep 24 2018, 6:48 PM
JMinor moved this task from Needs Triage to Product Backlog on the Wikipedia-iOS-App-Backlog board.
mpopov renamed this task from Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data to [EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data.Jan 23 2019, 9:35 PM
mpopov edited projects, added Wikipedia-Android-App-Backlog, Epic; removed Analytics.
mpopov updated the task description. (Show Details)
mpopov removed subscribers: NHarateh_WMF, Nuria.

@JMinor @Charlotte: after speaking with @kzimmerman we decided I should manage this project and that it has priority (at least on our team). I'll follow up with you on next steps.

A comment to emphasize last point: it seems a lot easier for the app to send events in the manner I described above than to centralize cookie management in the manner a browser would do it. i can work with android devs on describing further how to handle this events as needed be.

JMinor raised the priority of this task from Medium to Needs Triage.Feb 21 2019, 1:32 AM
JMinor moved this task from Product Backlog to Tracking on the Wikipedia-iOS-App-Backlog board.

@Nuria Is this unique count data currently available to query for iOS and Android? Where are the tables stored and what is the best way to query to get DAU/MAU per platform? (I can't tell if this ticket was completed. I need to pull the most accurate data we have for Quarterly Insights, which led me here. How far back historically does this data go?).

This ticket was never started so numbers existing are partial, data is on tables on wmf database.

mobile_apps_uniques_daily
mobile_apps_uniques_monthly

Thing to remember: the apps are currently included in the unique-devices numbers. When we move apps-uniques to another counting way, we should care removing it from the other.

@JAllemandou do you know which access site are the apps counted under - desktop or mobile site, or are they not part of that categorization?

@SNowick_WMF I think we are missing couple things here, the data for mobile apps uniques is on mobile_apps_uniques_daily. There are no calculations for mobile apps uniques anywhere else, @JAllemandou's comments is around the fact that users that for example open a Webview on a mobile app to the mobile site are counted together with users that are directly accessing the mobile site as that data is not explicitly excluded.

@Nuria has it correct. Some more details on my thoughts:

  • In the current unique-devices count (the mian one, not the specific mobile-app one) there is no split on access-method. The per-domain can be used (en.m.wikipeida.org is more likely to be hit by mobile web etc), but this is not explicitely done.
  • With the last change we made on unique-devices, the mobile app-install-id is part of the actor-signature we use to define actors, therefore mobile-apps are counted as devices in the main number, and if we define another way to count them we shouldn't add that number to the already computed one.

I Hope I'm clearer :)

but this is not explicitely done.

right, we should explicitily exclude agent-type="mobile-app"

LGoto triaged this task as Medium priority.Aug 17 2020, 4:27 PM
LGoto moved this task from Current Quarter to Epics on the Product-Analytics board.
LGoto moved this task from Epics to Backlog on the Product-Analytics board.