Page MenuHomePhabricator

Investigate recent increase in pageviews in September and October
Closed, ResolvedPublic

Description

In September and October 2019, there was an increase in year-over-year pageviews (Sept 3.1% & Oct 4.9%). We'd like to look into possible reasons for this increase.

Event Timeline

cchen triaged this task as Medium priority.Dec 4 2019, 1:23 PM
cchen edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

Here is a current summary of findings:

  • Platform: there is a YoY increase in pageviews on mobile web (Sept 12.9% and Oct 12.3%), and the YoY decrease in pageviews on desktop is also diminishing compare to previous months (Sept -9.1% and Oct -4.4%).
  • Project: The increase in pageviews was distributed across multiple Wikipedias, mainly form en.wikipedia and es.wikipedia.
  • Country: the YoY increase are mainly from US for both two months.
    Screen Shot 2019-12-04 at 9.19.54 PM.png (1×1 px, 255 KB)
  • Referrers: There was a slightly YoY increase in external search engine (Sept 6% and Oct 2.6%) , and a larger increase in NONE referer class (Sept 6.5% and Oct 19.8%). Some further investigation regarding none referers:
    1. Tiktok updated with an integration of direct links to Wiki in later September, which is a direct referral source. But we didn't find any significant increase pageviews that look like brand related.
    2. By looking at ISP data from web request, lots of direct traffic from Google proxies. (likely Google weblight?).
    3. As Nuria and Isaac mentioned in T195880#4429156, Chrome Mobile version 38 which is a Google Weblight Proxy in an older version of Android 4 is also a major culprit of YoY increase in no-referrer traffic.
      Screen Shot 2019-12-05 at 1.48.16 AM.png (1×2 px, 256 KB)
    4. Some pages mostly viewed by direct traffic. e.g. in September, Solar_System with 1,121,740 pageviews and F5_Networks with 816,224 pageviews. in October, IPv4 with 2,317,561 pageviews and Petrodollar_recycling with 1,567,340 pageviews.
  • Comparing refined data vs logs, not finding repeated counts.

Tiktok updated with an integration of direct links to Wiki in later September, which is a direct referral source. But we didn't find any significant increase pageviews that look like brand related.

I also looked into this but couldn't find any videos yet that have linked to Wikipedia. I'm really curious to follow this though so if you find any examples, please let me know!

Some pages mostly viewed by direct traffic. e.g. in September, Solar_System with 1,121,740 pageviews and F5_Networks with 816,224 pageviews. in October, IPv4 with 2,317,561 pageviews and Petrodollar_recycling with 1,567,340 pageviews.

I've been curious about this spike as well and based on this it definitely looks like it's undetected bots. Most of these are consistent with bot traffic -- e.g., more evidence of spikes here that do not look at all human as with the Joker movie: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2019-09-01&end=2019-11-01&pages=IPv4|Simple_Mail_Transfer_Protocol|Solar_System|Petrodollar_recycling|Joker_(2019_film)|F5_Networks

At first the F5 Networks looked like it could be legitimate though as they evidently opened a fancy new tower and their daily traffic follows the usual weekly ebb/flow. Taking a glance at user agents though, they seem to fit this pattern here and here, though I don't know what to make of that.

@Nuria do you have rules for the new bot tagging that Connie could apply to the October data?

@cchen can you provide an estimate for how much of the traffic is likely bot, based on your investigations?

An update on how long the unidentified bot traffic persisted:
The unusual spike in from a Google Weblight proxy, Chrome Mobile version 38, peaked in November of 2019 and finally returned to historic levels in June 2020. It's possible some of the high numbers in March through May were related to the start of the pandemic.

Screen Shot 2021-12-22 at 10.42.07 AM.png (700×2 px, 194 KB)

https://superset.wikimedia.org/r/1031

Noting for future reference - as Connie noted above (https://phabricator.wikimedia.org/T239811#5713328), the impact is largely specific to US traffic:

Screen Shot 2022-03-24 at 5.13.32 PM.png (1×1 px, 202 KB)

https://superset.wikimedia.org/r/1439

from "none" referers:

Screen Shot 2022-03-24 at 5.14.20 PM.png (1×1 px, 207 KB)

https://superset.wikimedia.org/r/1438

and access method is mobile web (the browser identified is a mobile browser)

Screen Shot 2022-03-24 at 5.33.43 PM.png (1×1 px, 182 KB)

https://superset.wikimedia.org/r/1440