Page MenuHomePhabricator

Investigate recent increase in pageviews in September and October
Open, MediumPublic

Description

In September and October 2019, there was an increase in year-over-year pageviews (Sept 3.1% & Oct 4.9%). We'd like to look into possible reasons for this increase.

Event Timeline

cchen created this task.Dec 4 2019, 1:21 PM
cchen triaged this task as Medium priority.Dec 4 2019, 1:23 PM
cchen edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
cchen added a comment.EditedDec 4 2019, 6:29 PM

Here is a current summary of findings:

  • Platform: there is a YoY increase in pageviews on mobile web (Sept 12.9% and Oct 12.3%), and the YoY decrease in pageviews on desktop is also diminishing compare to previous months (Sept -9.1% and Oct -4.4%).
  • Project: The increase in pageviews was distributed across multiple Wikipedias, mainly form en.wikipedia and es.wikipedia.
  • Country: the YoY increase are mainly from US for both two months.
  • Referrers: There was a slightly YoY increase in external search engine (Sept 6% and Oct 2.6%) , and a larger increase in NONE referer class (Sept 6.5% and Oct 19.8%). Some further investigation regarding none referers:
    1. Tiktok updated with an integration of direct links to Wiki in later September, which is a direct referral source. But we didn't find any significant increase pageviews that look like brand related.
    2. By looking at ISP data from web request, lots of direct traffic from Google proxies. (likely Google weblight?).
    3. As Nuria and Isaac mentioned in T195880#4429156, Chrome Mobile version 38 which is a Google Weblight Proxy in an older version of Android 4 is also a major culprit of YoY increase in no-referrer traffic.
    4. Some pages mostly viewed by direct traffic. e.g. in September, Solar_System with 1,121,740 pageviews and F5_Networks with 816,224 pageviews. in October, IPv4 with 2,317,561 pageviews and Petrodollar_recycling with 1,567,340 pageviews.
  • Comparing refined data vs logs, not finding repeated counts.
Isaac added a subscriber: Isaac.Dec 13 2019, 5:27 PM

Tiktok updated with an integration of direct links to Wiki in later September, which is a direct referral source. But we didn't find any significant increase pageviews that look like brand related.

I also looked into this but couldn't find any videos yet that have linked to Wikipedia. I'm really curious to follow this though so if you find any examples, please let me know!

Some pages mostly viewed by direct traffic. e.g. in September, Solar_System with 1,121,740 pageviews and F5_Networks with 816,224 pageviews. in October, IPv4 with 2,317,561 pageviews and Petrodollar_recycling with 1,567,340 pageviews.

I've been curious about this spike as well and based on this it definitely looks like it's undetected bots. Most of these are consistent with bot traffic -- e.g., more evidence of spikes here that do not look at all human as with the Joker movie: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2019-09-01&end=2019-11-01&pages=IPv4|Simple_Mail_Transfer_Protocol|Solar_System|Petrodollar_recycling|Joker_(2019_film)|F5_Networks

At first the F5 Networks looked like it could be legitimate though as they evidently opened a fancy new tower and their daily traffic follows the usual weekly ebb/flow. Taking a glance at user agents though, they seem to fit this pattern here and here, though I don't know what to make of that.

@Nuria do you have rules for the new bot tagging that Connie could apply to the October data?

@cchen can you provide an estimate for how much of the traffic is likely bot, based on your investigations?

cchen moved this task from Kanban to Backlog on the Product-Analytics board.May 4 2020, 6:12 PM
cchen edited projects, added Product-Analytics; removed Product-Analytics (Kanban).