We would like to learn about the actual usage of Reading Lists Syncing (RLS) feature by users, not just whether they have it enabled or not -- which we can get with EventLogging (EL) albeit only on Android, not iOS. Specifically, the people working on this feature and the people making resourcing decisions have questions like:
- How many people actually syncing or just enabling syncing?
- How many people getting actual use out of it by syncing to multiple devices or are they just sending data to their account in the cloud without syncing to another device?
- How many users syncing reading lists across their iOS and Android devices?
After several discussions about privacy and workload implications, we've arrived at the following solution (formerly Proposal 4, for those keeping track).
## Cross-device identifier
On the RLS Backend, we would generate a unique identifier `crossDeviceID` for each user and that ID would be used to associate app installs together. It will be up to the Reading Infrastructure team to decide between these two methods:
1. For every user who syncs, generate a random ID and put it into a mapping table that would be used to look up the randomly generated ID by username
2. Use a deterministic hashing function that takes a username and returns the hashed version
In either case, we should be able to find out `crossDeviceID` given username. Method 1 has the disadvantage that we would be able to find out username given `crossDeviceID`, which is not the case for Method 2. On the other hand, looking up the pre-computed ID via Method 1 is probably way faster (and involves less computation) than hashing the username every time on-demand via Method 2. Although since `appInstallID`s (see notes below) will need to be hashed anyway, we might as well go with Method 2.
When a user syncs their reading lists, RLS Backend would send an event to EL with the following information:
1. The `crossDeviceID` (see section above)
2. Salted & hashed `appInstallID` (see notes below)
3. if possible, RLS Backend should set the UserAgent (UA) to the UA it received from the app (see notes below)
- Due to the 90 day data retention policy and the auto-purging put in place by Analytics Engineering, devices that haven't been synced in more than 90 days would just disappear from the table
- The reason to hash `appInstallID` is because these events would end up somewhere where we would be able to join with behavioral data sent by mobile apps, which we DON'T want
- The reason to salt `appInstallID` on the RLS Backend side before hashing it is to prevent someone (i.e. a data analyst) from just applying the same hashing function to the `appInstallID` in those other tables and then joining by hashed `appInstallID`
- Ideally the RLS Backend would forward the UAs from the apps because Analytics Engineering parse the UAs and put them into nicely query-able structured data. Otherwise, RLS Backend would need to parse the UA itself and send: app version, OS family, and OS major.minor version