Page MenuHomePhabricator

Detecting in-app browser traffic
Closed, ResolvedPublic

Description

In service of the parent task, there's something that would help @nshahquinn-wmf with his analysis. We're not sure the best way to detect in-app browser traffic to our site in other apps (i.e. someone is in their TikTok app, they click a link to Wikipedia, and the article opens in TikTok's in-app browser, as opposed to Safari or Chrome). Though this question is not about the Wikipedia iOS or Android apps, @JMinor suggested that people who work on our apps would have a sense of how this might work in other apps (and suggested that @SNowick_WMF might know the answer).

The question is: what could we be looking for in the referral URL, or in the user-agent, or elsewhere to see whether traffic came from TikTok?

Thank you for any insight you can give (over the next week or so)!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Chiming in with my quick summary of what I know:

Data that might be useful for this sort of question:

  • Referrals: can identify a pageview where a reader jumped from an external site -> Wikimedia. Works great on mobile/desktop internet browsers where it'll share the domain the click came from like www.tiktok.com. However, many third party sites we're interested in are actually apps and not websites. Handling of these referrers when readers jump from an app to Wikipedia on mobile phones is very app-dependent and so this data can miss a lot of referrals (we've estimated that we only see ~50% of Youtube referrals in referral data: T289268#7586323)
  • wprov: when we work with platforms to improve referral tracking, we often ask them to add wprov parameters to any clicks through to Wikimedia sites. This helps to solve the lost referral data though has some challenges too (T252227).
  • User-agent: tells us information about the device (hardware; OS; browser) being used to view a Wikipedia article. User-agents are just strings and so really can be anything. To parse them into much more interpretable / consistent features (user_agent_map), we depend on an open-source project called ua-parser. For TikTok in particular, they have not added the logic there to identify when a user-agent is associated with the TikTok in-app browser (we should ask them to do that as it's good for everyone). I've explored this a bit and the best way to identify TikTok's in-app browser in the meantime seems to be the presence of the phrase BytedanceWebview in the user-agent. All to say, we should ask TikTok to add their info to ua-parser so it's officially tracked but in the meantime, we can dig information on which pageviews to Wikipedia came via TikTok's in-app browser pretty well.

The question is: what could we be looking for in the referral URL, or in the user-agent, or elsewhere to see whether traffic came from TikTok?

Answering what I think is true for TikTok, which is different from other sites because it a) forces users to stay within TikTok's in-app browser rather than e.g., switching to Chrome mobile, and, b) the TikTok in-app browser actually seems to do a good job of passing along referrers correctly to us. We can probably know the following things with a good degree of confidence as long as those two things remain true:

  • How many Wikimedia pageviews happen within the TikTok in-app browser (based on user-agent)
  • How many Wikimedia sessions / pageviews start from TikTok (via user-agent / referrals)

Without implementing wprov logic, however, we can't know which parts of TikTok the user came from etc. (just that the traffic is happening via TikTok).

Assumptions that I think are true but might skew data:

  • The only people using Bytedance's browser are doing it via the TikTok app. There might be other apps they have that have a similar user-agent or possibly even it's possible to download their browser separate from TikTok.
  • TikTok still forces in-app traffic via their web browser (if not, we'd have to rely more heavily on referral or wprov data to capture a more complete picture of TikTok-referred traffic)
  • TikTok still passes referral data correctly (if not and in-app browser traffic was incomplete, we might have gaps in our data or have to push for the wprov strategy)
LGoto triaged this task as Medium priority.
LGoto removed a project: iOS-app-Bugs.

Thank you @Isaac for all of this information. I took at look at Android en.wikipedia pageview browser data and don't see any values that are Bytedance related (Turnilo link) but that's expected because, as you stated, the ua-parser isn't looking for that BytedanceWebview value - it's not specified in the regex.yaml. We can see results for Instagram which also uses an in app browser, as Instagram is indicated in that ua-parser regex.yaml. We can probably assume that any TikTok views we are getting are being classified as generic Other along with any other source not in ua-parser.

Thank you @Isaac , we think you resolved this question for us. @nshahquinn-wmf is going to review this in context of the work he's doing, so moving to Needs Review.

I've filed T325611: Add TikTok's in-app browser to ua-parser library and consolidated a bunch of the information we have about referrers (the large majority of it from @Isaac! 😄) onto Research:Referrer on Meta-Wiki.