|Open||None||T138207 [Open question] Improve bot identification at scale|
|Resolved||Addshore||T199517 Investigate June Unique devices increase of 170% for wikidata|
|Open||None||T200020 Annotations in wikistats that are only visible on "all" time range get bundled up (probably an issue we cannot resolve until we have a more granular time range)|
So, thanks to @JAllemandou for reminding me that turnilo should be the thing I use to investigate this.
It looks like the maint spike was between May 30th and June 3rd (inclusive) with more requests that are part of the spike tailing off until June 20th.
This can be seen in the graph below, which also shows that the spike came from a single country.
Looking at this further it would appear that all requests in the spike came from a single ISP and from a few different ~5 IPs.
The UAs for the requests seems to all be or mostly be unique with various version numbers within the UAs being different.
This results in the requests being detected as different devices hence the spike.
Perhaps some further investigation is needed
It looks like this might be some bot or script scraping stuff that isn't identified as a script in any way, and that is rotating UAs...
The requests just seem to be to random entities, some existing, some not existing, but nothing more fancy than that.
Bot did not accepted cookies, user agent was changing slightly, in 1000 records when this event is happening 995 are part of event and of those about 200 are unique user agents. Still the IP is the same and the volumes of requests so high that I am wondering how these requests did not get throttled. Will look at throttling limits.
yes , please, I listed issue on dataset page: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices#Changes_and_Known_Problems_with_Dataset
We do not yet have annotations in wikistats (we will at the end of quarter) but when we do this is a good one to list. Moving ticket to bot work.