We would like to learn about the actual usage of the Synced Reading Lists feature by users, not just whether they have it enabled or not (which we can get with EventLogging albeit only on Android, not iOS). For example, are people actually syncing or just enabling syncing? Are people getting use out of it by syncing to multiple devices or are they just sending data to the cloud without syncing to another device? Are users of the beta version who have both iOS & Android apps syncing across platforms?
To answer these questions we can look at request logs in [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest | `wmf.webrequest` ]], but unfortunately there's no information that we can use to link requests from the same user across apps.
# Proposal 1 (Original)
To that end, I propose that API calls made by the apps to /api/rest_v1/data/lists/ include a user-identifier (e.g. `wmfsyncid`) in the [[ https://wikitech.wikimedia.org/wiki/X-Analytics | X-Analytics ]] header. Almost like a cross-platform `wmfuuid`! :) It's fine if the identifier is a hashed username (as long as both apps yield the same hash of the same username) since we're not interested in identifying specific users or looking at specific users' reading lists, just their usage of the syncing feature. **Again**, this would //only// be for sync-related requests, no other requests.
My strong preference is for the identifiers to be sent with //**all**// sync-related requests and not just when the user has opted-in to in-app analytics (like how `wmfuuid` behaves) so that we can get actual usage numbers, not biased estimates. But more importantly, I expect the # of people who are opted-in on every device they own to be incredibly small if not downright zero.
# Proposal 2 (New alternative)
> Alternatively, the Reading Infrastructure team could generate their own backend logging with user-identifying info but I'm pretty sure this is the simplest approach. I cannot think of any other way to assess the success or usage of this feature in a meaningful way without doing this, but I'm open to ideas and suggestions.
In a meeting with @Fjalapeno, he suggested registering devices on the backend and not having the apps send any additional, user-identifying info. The backend would generate a unique ID for the user that can be used to link multiple app install IDs. For each registered device, the backend would insert/update:
- 🔑 Cross-device ID (stays the same)
- 🔑 App install ID ( stays the same)
- OS family (stays the same)
- App version (will change as user updates app)
- OS major.minor version (will change as user updates OS)
- Timestamp of the last sync (will change with usage)
The following diagram illustrates this:
Adding another proposal per my conversation with @Fjalapeno
# Proposal 3 (new new alternative)
We use eventlogging to send data like (user_id, reading_list_id, platform, reading_list_transaction_id) that quantifies (per user_id) access to a reading list across platforms (platform being different installs of wikipedia app). This assumes that logging in is needed to access your reading list across platforms (which I think is how this feature works) , also assumes service backend can provide transaction_ids per reading_list.
Requires instrumentation in Android but leaves out the iOS case. Now, note that with this scheme you can infer whether users use more than 1 platform in Android (phone and tablet) and whether they use reading lists across android/iOS.
Using "reading_list_transaction_id" we can learn about iOS/Android cross-device usage using just instrumentation on Android alone without additional ids. A sequence of transactions on event logging like 1, 2, 3, 4 given a reading_list would tell us the user is only using an instrumented (Android) platform.
A sequence like 1, 3, 5 of reading_list_transaction_id would tell us that user has done some transactions (2 and 4) on the reading list on a platform we have not instrumented (iOS).
Events send to EL would look like
(user_id, reading_list_id, platform, reading_list_transaction_id)
001, 001, Android-phone-1, 01
001, 001, Android-tablet-2, 02
001, 002, Android-phone-1, 03
002, 001, Android-phone-1, 01
002, 001, Android-phone-1, 03
003, 001, Android-phone-1, 01
003, 001, Android-tablet-2, 02
User 001 has an Android phone and a tablet and syncs reading lists among both.
User 002 has and android-phone and some other device to which he syncs list.
User 003 uses only one device to interact with reading lists.
This schema requires no additional identifiers created (other than distinct transaction ids done server side that are somewhat sequential) and I think provides all the information required.
# Proposal 4 (New³ alternative)
@chelsyx @Nuria and I (@mpopov) met to discuss this and have come up with the following:
Instead of maintaining a table (updating records, dropping data older than 90 days), backend just sends RLS usage updates as events to EventLogging with the following schema:
| Property | Type | Required |
| crossDeviceID | string | true |
| appInstallID | string | true |
| wmf_app_version | string | true |
| os_family | string | true |
| os_version (major.minor format) | string | true |
Note that we don't need a last sync timestamp since we can just refer to the timestamp in EventLogging.
The event can be sent by the backend every time the backend responds to a request from an app. Devices that haven't been synced in more than 90 days would disappear from the table automatically.