Page MenuHomePhabricator

Big increase in traffic for projects except 'wikipedia' family since Feb 14th
Closed, ResolvedPublic

Description

During Feb 14th and 15th we received traffic anomaly alerts for a group of countries including Uzbekistan, Kazakhstan, Libya and Pakistan.
All those showed an increase in traffic not recognized as bots. One particularity is that the traffic increase was attributed to either en.wikipedia, commons.wikimedia, species.wikimedia and mediawiki.org. This last one was the most clear example for Uzbekistan, when on Feb 14th at 6am UTC, was the most visited wiki in the country (with other wikis showing normal traffic levels). See chart: https://tinyurl.com/oxwtczba
Then this chart has been pointed to us (thanks @MusikAnimal ): https://pageviews.toolforge.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-20&sites=en.wikibooks.org|en.wikinews.org|en.wikiquote.org|en.wikisource.org|en.wikiversity.org|en.wikivoyage.org
showing that the problem is actually broader than what we had already seen.

Event Timeline

I have done some checking:

  • MaxMind database update was on Feb 9th and archived files got deleted on Feb 11th - This seems unrelated.
  • There clearly seem to have a small number of IPs making most requests for projects having seen a change (en.wikipedia, commons.wikipedia` for instance).
  • The requests show a high variability of user agent, but the number of request per agent is extremely regular - this looks like automated traffic trying to desguise itself by changing user-agent.
  • The requests show a high variability in the visited pages, so the impact on per-page metric is relatively small.

There clearly seem to have a small number of IPs making most requests for projects having seen a change (en.wikipedia, commons.wikipedia` for instance).

Thanks for looking into this! That makes sense. It's curious how the automated traffic detection didn't catch those, if they share IPs. Maybe we can improve the heuristics for this particular case.

It's curious how the automated traffic detection didn't catch those, if they share IPs. Maybe we can improve the heuristics for this particular case.

The reason traffic has not been flagged is because there is no (ip, user_agent) pair making more than 800 request per moving-24h. Some IPs are prevalent, but the telecom company and I assume they do nating. Also, there is a wide variability in page visited. The only possible heuristic I can think of that could catch traffic with low-volume is regularity querying (doing repetitive querying at regular interval) - But this is a complicated heuristic :)

Thanks for opening this task, Marcel.

Joal, thanks for investigating this: it is helpful context for some past and possibly future alerts as well that we may have (had) trouble understanding.

We could add a tag to pageviews generated by actors with high-trafic IPs.
It would not change the way we process, count or classify traffic today,
but we could use it to filter out this type of traffic when doing analyses like traffic anomalies.

There seem to be a broader issue with related countries: https://pageviews.toolforge.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-20&sites=en.wikibooks.org|en.wikinews.org|en.wikiquote.org|en.wikisource.org|en.wikiversity.org|en.wikivoyage.org

I checked countries quickly for some projects and for all of the ones I checked the raise of traffic was always from the same counties: India, Russia, Uzkekistan, Kazakhstan, Ukraine. Some of these countries (Russia, Kazakhstan, Uzbeksitan) were in the list of countries raised by entropy alarms in the past days.

@kzimmerman : Could your team provide help on this?

JAllemandou renamed this task from The most visited wiki in Uzbekistan on Feb 14th at 6am UTC is mediawiki.org to Big increase in traffic for projects except 'wikipedia' family since Feb 14th.Feb 17 2021, 1:58 PM
JAllemandou added a project: Product-Analytics.
JAllemandou updated the task description. (Show Details)
JAllemandou added a subscriber: MusikAnimal.

@JAllemandou it looks like you checked the main dimensions to investigate; the other thing is that the jump only happens on desktop (mobile web looks normal). Connie's going to raise this in our team sharing meeting tomorrow; I'll add you as optional though I think it's too late your time.

Hi all, I also found that big increase traffic for projects in most local wikipedias in Indonesia has same problem, except bug.wiki. Please check bug.wikipedia.org|gor.wikipedia.org|tet.wikipedia.org|su.wikipedia.org|min.wikipedia.org|ace.wikipedia.org|jv.wikipedia.org|bjn.wikipedia.org|map-bms.wikipedia.org| | this for more info. Love to know what actually happen and how to handle this in the future.

LGoto triaged this task as Medium priority.Mar 2 2021, 6:13 PM
LGoto edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
LGoto moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

@cchen can you summarize the findings from you & @JAllemandou here, for future reference? My understanding is that you didn't find solid trends that could identify the traffic as bots, but we still suspect bot traffic and will have to speak to this in the Key Product Metrics.