Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T138207 [Open question] Improve bot identification at scale | |||
Resolved | • Addshore | T199517 Investigate June Unique devices increase of 170% for wikidata | |||
Open | None | T200020 Annotations in wikistats that are only visible on "all" time range get bundled up (probably an issue we cannot resolve until we have a more granular time range) |
Event Timeline
So, thanks to @JAllemandou for reminding me that turnilo should be the thing I use to investigate this.
It looks like the maint spike was between May 30th and June 3rd (inclusive) with more requests that are part of the spike tailing off until June 20th.
This can be seen in the graph below, which also shows that the spike came from a single country.
turnilo link
Looking at this further it would appear that all requests in the spike came from a single ISP and from a few different ~5 IPs.
The UAs for the requests seems to all be or mostly be unique with various version numbers within the UAs being different.
This results in the requests being detected as different devices hence the spike.
Perhaps some further investigation is needed
It looks like this might be some bot or script scraping stuff that isn't identified as a script in any way, and that is rotating UAs...
The requests just seem to be to random entities, some existing, some not existing, but nothing more fancy than that.
It coincides with a spike of pageviews from thailand, that seems like a bot accessing the desktop size, will investigate a bit as to whether this bot was accepting cookies.
Bot did not accepted cookies, user agent was changing slightly, in 1000 records when this event is happening 995 are part of event and of those about 200 are unique user agents. Still the IP is the same and the volumes of requests so high that I am wondering how these requests did not get throttled. Will look at throttling limits.
@Nuria should I file a follow up ticket about adding an annotation to the graph explaining this spike?
yes , please, I listed issue on dataset page: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices#Changes_and_Known_Problems_with_Dataset
We do not yet have annotations in wikistats (we will at the end of quarter) but when we do this is a good one to list. Moving ticket to bot work.
Added annotation for this event to wikidata unique devices data on wikistats: http://localhost:5000/dist/#/wikidata.org/reading/unique-devices/normal|line|All|~total