Page MenuHomePhabricator

Top Pageview stats for August 27th doesn't look right
Closed, DuplicatePublic

Description

Also, this is just based off of experience, but the traffic for August 27th doesn't look right: http://top.hatnote.com/en/wikipedia/2016/8/27.html

The spammers almost never all rise to the top like that, and the top (non-spam) page only having <150k views is pretty suspect, as well. Could this be related to T141506? The dates line up.

For my reference, here's the user report threads on twitter:

https://twitter.com/mhashemi/status/770706296072921088
https://twitter.com/mhashemi/status/772198929773449216

Event Timeline

Here are the dates where the pageviews for the top three pages started to rise drastically, remaining on that high plateau since then:
AMGTV: August 14
Okto: July 14
Proyecto 40: around July 14
(On the other hand, xHamster as #4 on the list does not show such an unusual development during the past 3 months, although it may be abnormally high for other reasons.)

The dates actually do not coincide with the main page rises T141506 , but then again we already found there that these occurred on different dates (July 20 vs. July 22). I agree it could very well be related.

(CCing @JMinor as we have been talking just the other day on how this causes issues for the "Explore" feed of the iOS app.)

One idea is to take advantage of the webrequest -> pageview_hourly transformation, which already groups by hour. We could add a column to pageview_hourly called "distinct_user_agent_count" or something like that. This could then be used to filter results for the top endpoint, per-article endpoint, etc. Or it could be used as a good guess about bots that don't identify themselves.

Nuria subscribed.

There is nothing special about this day that i can see but rather another instance of unfiltered bot traffic distorting top endpoints