We would like to learn about the actual usage of the Synced Reading Lists feature by users, not just whether they have it enabled or not (which we can get with EventLogging albeit only on Android, not iOS). For example, are people actually syncing or just enabling syncing? Are people getting use out of it by syncing to multiple devices or are they just sending data to the cloud without syncing to another device? Are users of the beta version who have both iOS & Android apps syncing across platforms?
To answer these questions we can look at request logs in [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest | `wmf.webrequest` ]], but unfortunately there's no information that we can use to link requests from the same user across apps.
# Proposal 1 (Original)
To that end, I propose that API calls made by the apps to /api/rest_v1/data/lists/ include a user-identifier (e.g. `wmfsyncid`) in the [[ https://wikitech.wikimedia.org/wiki/X-Analytics | X-Analytics ]] header. Almost like a cross-platform `wmfuuid`! :) It's fine if the identifier is a hashed username (as long as both apps yield the same hash of the same username) since we're not interested in identifying specific users or looking at specific users' reading lists, just their usage of the syncing feature. **Again**, this would //only// be for sync-related requests, no other requests.
My strong preference is for the identifiers to be sent with //**all**// sync-related requests and not just when the user has opted-in to in-app analytics (like how `wmfuuid` behaves) so that we can get actual usage numbers, not biased estimates. But more importantly, I expect the # of people who are opted-in on every device they own to be incredibly small if not downright zero.
# Proposal 2 (New alternative)
> Alternatively, the Reading Infrastructure team could generate their own backend logging with user-identifying info but I'm pretty sure this is the simplest approach. I cannot think of any other way to assess the success or usage of this feature in a meaningful way without doing this, but I'm open to ideas and suggestions.
In a meeting with @Fjalapeno, he suggested registering devices on the backend and not having the apps send any additional, user-identifying info. The backend would generate a unique ID for the user that can be used to link multiple app install IDs. For each registered device, the backend would insert/update:
- 🔑 Cross-device ID (stays the same)
- 🔑 App install ID ( stays the same)
- OS family (stays the same)
- App version (will change as user updates app)
- OS major.minor version (will change as user updates OS)
- Timestamp of the last sync (will change with usage)
The following diagram illustrates this:
Adding another proposal per my conversation with @Fjalapeno
# Proposal 3
We use eventlogging to send data like (user_id, readinglist_id, platform) that quantifies (per user_id) access to a reading list across platforms (platform being different installs of wikipedia app). This assumes that logging in is needed to access your reading list across platforms (which I think is how this feature works) .
Requires instrumentation in android and desktop but leaves out the iOS case. Now, note that with this scheme you can infer whether users use more than 1 platform cause if all actions on a reading list are done via one platform alone you can safely infer that users mostly do not sync across devices.
You can also learn about IOS case using instrumentation on Android alone if you provide a reading llist transaction id with every transaction. A sequence of transactions on event logging like 1, 2, 3, 4 would tell you user is only using instrumented (android) platform. A sequence like 1, 3, 5 would tell you user has done some transactions (2 and 4) on the reading list on a platform we have not instrumented (iOS).