Page MenuHomePhabricator

Skewed pageviews for Azerbaijani and Bulgarian Wikipedias, September, October and November 2016
Closed, DuplicatePublic


I see odd data for Azerbaijani and Bulgarian Wikipedias in the pageviews tool.

The most striking examples are from Azerbaijani:

You can see a jump up starting towards the end of October.

The most popular page, by far, is called "xss":

You can see something similar also in Bulgarian, although on a much more modest scale:

And there, too, "xss" is the most popular page by far, though not as extremely as in Azerbaijani:

I don't see anything similar in other languages.

In both languages, "xss" is a page that doesn't exist.

I noticed these two languages, because they appeared to have strangely skewed data about interlanguage links clicks, which I'm tracking regularly.

It would be nice to understand why does this happen.

This reminds me slightly of T141506, although the reason is possibly different.

Event Timeline

Restricted Application added subscribers: kerberizer, Aklapper. · View Herald Transcript

Amire80: from report it is likely a bot looking for an exploit

Mediawiki returns 200 for requests that should be 404s (pages that do not exist) and from our end is not possible to distinguish those from pageviews.


More investigation:
Looking at December 2016 22nd:

These indeed fall into the big bucket of bots we don't catch :(

Untagging kanban. Adding as subtask of bot task

Tbayer added a subscriber: Tbayer.

Checking back for azwiki - see screenshot below - it appears to have ended there (exactly on christmas day BTW). But the culprit that @JAllemandou identified above is still generating an enormous amount of pageviews that are mistakenly classified as agent_type = 'user'. I have opened a new task about it, with more detail: T157528 siteviews-20160701-20170206.png (598×1 px, 33 KB)

@JAllemandou : sorry i should have looked into this closer, if this is a self-identified bot it should not be a subtask of this ticket, the cases we are agreggating here are the ones pertaining to bots that are not self-identified as such.