Page MenuHomePhabricator

Skewed pageviews for Azerbaijani and Bulgarian Wikipedias, September, October and November 2016
Closed, DuplicatePublic

Description

I see odd data for Azerbaijani and Bulgarian Wikipedias in the pageviews tool.

The most striking examples are from Azerbaijani:
https://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&start=2016-08-01&end=2016-12-18&sites=az.wikipedia.org

You can see a jump up starting towards the end of October.

The most popular page, by far, is called "xss":

https://tools.wmflabs.org/topviews/?project=az.wikipedia.org&platform=all-access&date=2016-11&excludes=

You can see something similar also in Bulgarian, although on a much more modest scale:

https://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&start=2016-08-01&end=2016-12-18&sites=bg.wikipedia.org

And there, too, "xss" is the most popular page by far, though not as extremely as in Azerbaijani:

https://tools.wmflabs.org/topviews/?project=bg.wikipedia.org&platform=all-access&date=2016-09&excludes=
https://tools.wmflabs.org/topviews/?project=bg.wikipedia.org&platform=all-access&date=2016-10&excludes=

I don't see anything similar in other languages.

In both languages, "xss" is a page that doesn't exist.

I noticed these two languages, because they appeared to have strangely skewed data about interlanguage links clicks, which I'm tracking regularly.

It would be nice to understand why does this happen.

This reminds me slightly of T141506, although the reason is possibly different.

Event Timeline

Amire80 created this task.Dec 19 2016, 5:59 PM
Restricted Application added a project: Analytics. · View Herald TranscriptDec 19 2016, 5:59 PM
Restricted Application added subscribers: kerberizer, Aklapper. · View Herald Transcript
Amire80 updated the task description. (Show Details)Dec 19 2016, 6:07 PM
Nuria added a subscriber: Nuria.EditedDec 20 2016, 8:46 PM

Amire80: from report it is likely a bot looking for an exploit

Mediawiki returns 200 for requests that should be 404s (pages that do not exist) and from our end is not possible to distinguish those from pageviews.

See: https://phabricator.wikimedia.org/T144100

Nuria edited projects, added Analytics-Kanban; removed Analytics.Jan 23 2017, 4:47 PM
JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.

More investigation:
Looking at December 2016 22nd:

These indeed fall into the big bucket of bots we don't catch :(

Nuria edited projects, added Analytics; removed Analytics-Kanban.Jan 31 2017, 9:11 PM

Untagging kanban. Adding as subtask of bot task

Tbayer added a subscriber: Tbayer.

Checking back for azwiki - see screenshot below - it appears to have ended there (exactly on christmas day BTW). But the culprit that @JAllemandou identified above is still generating an enormous amount of pageviews that are mistakenly classified as agent_type = 'user'. I have opened a new task about it, with more detail: T157528

Nuria added a comment.Feb 8 2017, 5:45 PM

@JAllemandou : sorry i should have looked into this closer, if this is a self-identified bot it should not be a subtask of this ticket, the cases we are agreggating here are the ones pertaining to bots that are not self-identified as such.

Nuria edited projects, added Analytics-Kanban; removed Analytics.Feb 8 2017, 5:47 PM