We would like to learn about the actual usage of Reading Lists Syncing (RLS) feature by users,[[ https://blog.wikimedia.org/2018/05/25/synced-reading-lists/ | Reading List Syncing (RLS) ]] is a really cool feature/service which required time & effort from multiple teams to make it happen. not just whether they have it enabled or not -- which we can get with EventLogging (EL) albeit only on Android,The biggest thing is that now articles saved for offline reading on the Android device(s) can be synced to the user's iOS device(s) too. not iOS.So, Specificnaturally, the people working on this feature and the people making resourcing decisions have questions like:there are some questions that would be great to answer:
- How many people actually syncing or just enabling syncing?
- How many people getting actual use out of it by syncing to multiple devices or are they just sending data to their account in the cloud without syncing to another device?
- How many users syncing reading lists across their iOS and//and// Android devices?
After several discussions about privacy and workload implications, we've arrived at the following solution (formerly Proposal 4, for those keeping track…as the answers to these questions would inform resourcing decisions regarding the future of this service and taking on similarly cross-platforms initiatives (like if we learn that I'm only one of 5 people total who sync across platforms).
## Cross-device identifier# Proposal
On the RLS Backend, we would generate a unique identifier `cross_device_id` for each user and that ID would be used to associate app installs together.After several discussions about privacy and workload implications, It will be up to the Reading Infrastructure team to decide between these two methods:we (@Fjalapeno @JMinor and I) have arrived at the following proposed solution:
1. For every user who syncs- When the user enables RLS on their device, generate a random ID and put it into a mapping table that would be usedthere's an event that is sent by the client to look up the randomly generated ID by usernameEL which registers the device.
2. Use a deterministic hashing function that takes a username and returns the hashed version
In either case, we should be able to find out `cross_device_id` given username. Method 1 has the disadvantage that we would be able to find out username given `cross_device_id`, which is not the case for Method 2. On the other hand, looking up the pre-computed ID via Method 1 is probably way faster (and involves less computation) than hashing the username every time on-demand via Method 2. Although since `app_install_id`s (see notes below) will need to be hashed anyway- The client then remembers when this event was sent and resends it after 60 days.
- If RLS is already enabled when the user opens the app for the first time after updating to the version that has this funnel, we might as well go with Method 2that's when the app sends the first registration event.
## EventLogging Schema
When a user syncsPossible name for their reading lists, RLS Backend would send an event to EL with the following inform schema is `MobileWikiAppReadingListSyncRegistration` or `MobileWikiAppRLSRegistration:`
1. The `cross_device_id` (see section above)
2.- `user_id` which has the username or ID, Salted & hasheddoesn't matter as long as it's consistent between iOS & Android and we can use it to link multiple `app_install_id` (see notes below)'s together
3. if possible, RLS Backend should set the UserAgent (UA) to the UA it received from the app (see notes below)- `app_install_id` that Android & iOS apps include in events anyway
**Some notes:**Since the events include User-Agent strings from the apps, we can use it to figure out if people are enabling RLS across platforms.
- Due to the 90 day data retention policy and the auto-purging put in place by Analytics Engineering, devices that haven't been synced in more than 90 days would just disappear from the table
- The reason to hash `app_install_id` is because these events would end up somewhere where we would be able to join with behavioral data sent by mobile apps, which we DON'T want
- The reason to salt `app_install_id` on the RLS Backend side before hashing it is to prevent someone (i.e. a data analyst) from just applying the same hashing function to the `app_install_id` in those other tables and then joining by hashed `app_install_id`
- Ideally the RLS Backend would forward the UAs from the apps because Analytics Engineering parse the UAs and put them into nicely query-able structured data. Otherwise, RLS Backend would need to parse the UA itself and send: app version,## Prioritization
This work is low priority. OS familyHaving these analytics would be nice to have at some point, and OS major.minor versionbut the stakeholders aren't itching to have those questions answered and there is more important work that needs to be done first.