Tiktok was recently added as an external media site for our referer data (T309769) -- i.e. pageviews referred from some form of tiktok.com (code). We recently realized that this heuristic was missing the majority of traffic coming from them because the traffic lacks a referer. There is a potential patch though I'm not strongly advocating for that myself, but I still felt it valuable to file the ticket to document.
Why we know we're missing traffic
A recent report from TikTok let us know that one of their features is generating 10s of thousands of pageviews to Wikipedia daily. If you look at our dashboards (private Turnilo example), however, you'll see that we only are seeing <10,000 total. In the past, referrals from tiktok.com essentially matched pageviews from the tiktok user-agent (see T324376 for background) but that no longer appears to be the case. For one day, I saw ~5000 referrals from tiktok.com but ~30000 pageviews with no referral information on the TikTok browser:
select referer, count(1) as num_views from pageview_actor where is_pageview and year = 2023 and month = 5 and day = 6 and user_agent like '%BytedanceWebview%' and agent_type = 'user' group by referer order by num_views desc limit 100;
For most websites, this is not fixable -- i.e. if they drop the referer, then it's just invisible traffic. The only time we're aware of this is when we have a second, independent source of information about referrals -- namely wprov parameters (details) as is the case with Youtube (T289268#7586323) or here because TikTok seems to enforce that users use their in-app web browser. Approaches:
- Do nothing -- this is the accepted, if unfortunate, state for many of our 3rd-party sites who have apps.
- Implement a patch to take advantage of the user-agent information. This is very specific to TikTok though and is not guaranteed to continue to work should they ease their restrictions on 3rd-party browsers for links clicked on in TikTok. Related patch on making it easier to track this user-agent: T325611
- Work long-term on getting TikTok to implement a wprov parameter while adjusting our pipelines to store wprov data long-term and make it more easily query-able. See T252227 and we could consider opening a new task about either supplementing referer_data in webrequests via wprov parameters or creating a separate table to track them like referrer_daily.