Page MenuHomePhabricator

Investigate June Unique devices increase of 170% for wikidata
Closed, ResolvedPublic

Event Timeline

Addshore created this task.Jul 13 2018, 9:15 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2018, 9:15 AM
Addshore claimed this task.Jul 13 2018, 9:46 AM

So, thanks to @JAllemandou for reminding me that turnilo should be the thing I use to investigate this.

It looks like the maint spike was between May 30th and June 3rd (inclusive) with more requests that are part of the spike tailing off until June 20th.
This can be seen in the graph below, which also shows that the spike came from a single country.


turnilo link

Looking at this further it would appear that all requests in the spike came from a single ISP and from a few different ~5 IPs.
The UAs for the requests seems to all be or mostly be unique with various version numbers within the UAs being different.
This results in the requests being detected as different devices hence the spike.

Perhaps some further investigation is needed

Addshore removed Addshore as the assignee of this task.Jul 13 2018, 10:06 AM
Addshore closed this task as Resolved.Jul 13 2018, 10:42 AM
Addshore claimed this task.

It looks like this might be some bot or script scraping stuff that isn't identified as a script in any way, and that is rotating UAs...
The requests just seem to be to random entities, some existing, some not existing, but nothing more fancy than that.

Nuria added a subscriber: Nuria.EditedJul 13 2018, 7:45 PM

It coincides with a spike of pageviews from thailand, that seems like a bot accessing the desktop size, will investigate a bit as to whether this bot was accepting cookies.

Nuria added a comment.EditedJul 14 2018, 5:57 AM

Bot did not accepted cookies, user agent was changing slightly, in 1000 records when this event is happening 995 are part of event and of those about 200 are unique user agents. Still the IP is the same and the volumes of requests so high that I am wondering how these requests did not get throttled. Will look at throttling limits.

@Nuria should I file a follow up ticket about adding an annotation to the graph explaining this spike?

Nuria added a comment.Jul 16 2018, 4:01 PM

yes , please, I listed issue on dataset page: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices#Changes_and_Known_Problems_with_Dataset
We do not yet have annotations in wikistats (we will at the end of quarter) but when we do this is a good one to list. Moving ticket to bot work.

Addshore removed Addshore as the assignee of this task.Jul 16 2018, 4:23 PM
Milimetric moved this task from Incoming to Radar on the Analytics board.Jul 19 2018, 3:32 PM
Tbayer added a subscriber: Tbayer.
Addshore moved this task from incoming to monitoring on the Wikidata board.Sep 19 2018, 7:29 AM
Nuria added a comment.Oct 1 2018, 10:59 PM

Added annotation for this event to wikidata unique devices data on wikistats: http://localhost:5000/dist/#/wikidata.org/reading/unique-devices/normal|line|All|~total

Addshore closed this task as Resolved.Oct 2 2018, 6:25 AM
Addshore claimed this task.

Looks great! :)