Page MenuHomePhabricator

[REQUEST] Impact of TikTok's Jumps feature on Wikipedia traffic
Closed, ResolvedPublic

Description

Name for main point of contact and contact preference: Maryana Pinchuk (contact on Phab via @Maryana , Slack, or email is all fine!)

What teams or departments is this for?: Partnerships and Product

What are the details of your request? Include relevant timelines or deadlines

What impact has a recent TikTok feature release (“Jumps”: https://newsroom.tiktok.com/en-us/tiktok-jump-enriching-the-tiktok-experience-with-new-integrations) that includes linking to Wikipedia had on our traffic numbers since launch? The feature was launched around June 21st.

I know we have trouble disambiguating app traffic (we’ve requested that they add a provenance URL, but still TBD – I don't believe they've added any custom referral URLs yet), so curious if we saw any unusual spikes in:

  • the unknown/other referrals
  • known TT referrals

How will you use this data or analysis?

There be a case for us investing Product resources into this feature if we’re seeing a lot of new reader traffic coming in from this vector.

Is this request urgent or time sensitive?

Time-sensitivity: ideally, it would be good to get 1) data for the June-August period compared to pre-launch on June 21 (so some time sensitivity given the temporary nature of traffic logs) but 2) (most critically) data for September onward compared to pre-September.

Urgency/priority: depending on the level of impact, this may be very relevant to Product thinking and roadmap for this year and beyond, so medium/high priority.

Event Timeline

mpopov renamed this task from [REQUEST] to [REQUEST] Impact of TikTok's Jumps feature on Wikipedia traffic.Aug 19 2021, 4:55 PM
mpopov updated the task description. (Show Details)
mpopov subscribed.

The team will review and triage this in our next workboard refinement meeting (Mon, 8/23)

ldelench_wmf triaged this task as Medium priority.

@nshahquinn-wmf Any updates on this task? We got some initial impressions data but are waiting more more recent impressions/CTR data from TikTok and are planning to present on this product at the monthly staff meeting this Thursday, so anything you can share on impact to site visits would be really helpful!

I've been working on this since yesterday.

So far I've been having a lot of trouble getting the job to complete successfully, since it requires doing a line-by-line text search through the webrequest dataset. A "yarn-large" Spark session didn't work, even when I tried applying all the suggested settings for extra large jobs. Neither did Hive, even when I quadrupled the heap size.

At this point, I will just try writing Python code to break this down into one query per day and then combine the results. It's worked before, but it's a pretty ugly solution since it means writing a fair amount of custom code to do something that the query engines should be able to do themselves. But sometimes ugly is the only way!

As expected, my ugly method worked. Here's a summary of what I found.

I looked through the webrequest data stream, which at the time contained full data for 15 July to 12 Oct, inclusive.

During those 90 days, there were 4.1 M pageviews with a referrer including "tiktok.com". 1,725 of those were on desktop, and the rest were on mobile web. I ignored the handful of desktop pageviews in my analysis. There were no mobile app pageviews with that referrer, probably because the mobile apps either do not receive or report referrers like the web versions do.

There are usually around 40,000 TikTok-referred pageviews each day, although as you can see there have been two recent spikes to around 100,000 per day (one peaking on 20 Aug and the other on 7 Oct).

tiktok traffic.png (490×888 px, 46 KB)

In case some TikTok traffic gets recorded with a blank referrer, I also looked at the number of mobile web pageviews without a referrer. As shown in the graph below, there are about 70 M per day. We haven't see any significant spikes or dips in the last 90 days.

blank referrer traffic.png (490×862 px, 35 KB)

The full code is in my misc-wikimedia-analysis repo on GitHub.

@Maryana please let me know your reaction to this data. Based on the numbers you quoted in chat, maybe there is a mismatch between the TikTok-referred pageviews we see and the amount we expect based on their data.

It shouldn't matter if the pageviews happen using the browser embedded in the TikTok app; the main thing that matters is whether TikTok sets a referrer on their traffic. Some important apps like WhatApp actually never do. Based on the data above, that isn't the case with TikTok, but it's still possible that TikTok doesn't set a referrer in all circumstance (for example, maybe they do when their internal browser is used but not when some other browser is used).

Thank you, @nshahquinn-wmf! We're going to set up a meeting with TT to talk more and get a bit more clarity on their CTR estimates. Will keep you posted if/when we learn more!

Sounds good! I'll close this now.

If in the future you have clarifying questions about this data, feel free to re-open this ticket. If you would like any new analysis or data, do open a new ticket.

It shouldn't matter if the pageviews happen using the browser embedded in the TikTok app; the main thing that matters is whether TikTok sets a referrer on their traffic. Some important apps like WhatApp actually never do. Based on the data above, that isn't the case with TikTok, but it's still possible that TikTok doesn't set a referrer in all circumstance (for example, maybe they do when their internal browser is used but not when some other browser is used).

This is closed but I came across it and wanted to just provide some info to back up what @nshahquinn-wmf was saying. I dealt with a similar issue with Youtube, but we have wprov parameters there to act as an independent source of which pageviews come from Youtube. By cross-comparing referral information with wprov parameters, I discovered the following for Youtube (circa May 2020):

Thanks, @Isaac! That's very helpful context. It would be great if you put that information on a wiki page somewhere since in general I think we don't understand that much about our referrer data.

It would be great if you put that information on a wiki page somewhere since in general I think we don't understand that much about our referrer data.

@nshahquinn-wmf good suggestion -- I started by adding it to this informal list I keep of analysis headaches/gotchas: https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Analysis_gotchas#Referral_Data

As far as I know, however, there is no Wikitech page dedicated to referral information (maybe there should be?). The Provenance page is the closest thing but doesn't suffer from this particular issue. Thoughts on where to put this info?

@Isaac Hmm...I was thinking it belong somewhere in the Research namespace on Meta, since it might be more interesting to researchers than to infrastructure developers. If you go with Meta, I think you'd need to create a new page (Research:Referrers?).

But I think Wikitech would be a reasonable place too; if you do it there, I would agree that a new page about referrers is the way to go.

Anyway, thanks for working on finding a forever home for this info! 😊