Page MenuHomePhabricator

Bot Identification: Inconsistent data in #all-sites-by-os-and-browser for IE7
Closed, ResolvedPublic3 Story Points

Description

On https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os-and-browser the share of IE7

  • is increasing regularly which is surprising,
  • includes a majority on OS Windows 7 and Windows 8, both OS incompatible with IE7.

These user agents should be either considered as other or invalid.

By the way is it a percentage of page views, daily unique devices, monthly unique devices, http request... ?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 17 2016, 7:20 PM
Ltrlg added a subscriber: Ltrlg.Oct 17 2016, 7:31 PM
Nuria added a subscriber: Nuria.Oct 17 2016, 7:37 PM

@Zebulon84: Percentage of pageviews.

mforns added a subscriber: mforns.Oct 17 2016, 7:40 PM

This may happen, because newer versions of IE still run in compatibility mode when detecting old* html syntax. In this mode the user agent sent in the headers may be the one of IE7.

https://msdn.microsoft.com/en-us/library/ms537503(v=vs.85).aspx

For example, if you're using Internet Explorer 9 to view a webpage in Compatibility View, the version token is, by default, MSIE 7.0.

(*) It seems that even with modern html code, IE can enter compatibility mode:
http://stackoverflow.com/questions/13284083/ie10-renders-in-ie7-mode-how-to-force-standards-mode

Then is it possible for WikiMedia to turn off this compatibility mode with the « header('X-UA-Compatible: IE=edge'); » solution suggested on the stackoverflow link ?

Nuria added a comment.Oct 18 2016, 4:44 PM

@Zebulon84: couple things come to mind.

  1. we have to prove the theory that is the compatibility mode driving the number of IE7 requests
  1. adding a header like that one might also cause js executing issues, I would open a ticket for that and follow up on specifics with mediawiki developers
Nuria moved this task from Incoming to Backlog (Later) on the Analytics board.Oct 24 2016, 3:41 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.Oct 24 2016, 7:41 PM
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria added a comment.Oct 24 2016, 7:48 PM

See IE usage, indeed IE7 seems to be increasing when compuing daily measures

Nuria added a comment.EditedOct 24 2016, 8:10 PM

If compatibility mode is triggered we would expect wikipedia to be in the IE compatibility list: https://msdn.microsoft.com/en-us/library/gg622935(v=vs.85).aspx

and it is there for IE10 but it says "emulateIE10' : http://cvlist.ie.microsoft.com/ie10/iecompatviewlist.xml
and same for IE11: https://iecvlist.microsoft.com/wpie11/1403264460/iecompatviewlist.xml

Entry is like: "<domain docMode="EmulateIE10" uaString="10">wikipedia.org</domain>"

So even on compatibility mode UA doesn't look like it is IE7, there might be other reasons why compatibility mode is triggered though.

Looks like edge does not have a compatibilty list.

Nuria added a comment.Oct 24 2016, 8:21 PM

Our stats indicate that IE7 usage increases in Windows 7. However for Ie8 and IE9 usage is lowering in all platforms. And given that our usage of windows 7 is stable overall it seems that traffic from ie8 and ie9 in windows 7 is shifting towards being targeted as IE7.

Nuria added a comment.Oct 25 2016, 8:12 PM

This requests are real and are happening (mostly) for Main_Page. Could be a bot or it could be an issue negotiating ssl again

Milimetric assigned this task to Nuria.Oct 27 2016, 3:46 PM
Milimetric set the point value for this task to 3.
Nuria added a comment.Oct 27 2016, 8:44 PM

After analyzing one hour of traffic requests are coming from mostly India/Iran/Pakistan/Afghanistan and they are all requests from Main_Page, this is again some kind of ping-keep-alive seems like but why the Ie7 UA?

Adding to our task about identifying bot traffic

Nuria edited projects, added Analytics; removed Analytics-Kanban.Nov 3 2016, 3:05 PM

See also {T157404}, excerpt from there:

Updating and extending Nuria's chart from above (global IE pageviews by version over time since mid 2015), it looks like this is still on the rise.
Assuming that there is no reason for IE7 traffic to rise naturally, we may be looking at at least 60 million extraneous pageviews per week currently, or about 1.7% of our total non-bot traffic - enough to prioritize this among the bot identification work, I would say.

(Source: Pivot)

Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Mar 13 2017, 5:13 PM
Milimetric renamed this task from Inconsistant data in #all-sites-by-os-and-browser fot IE7 to Bot Identification: Inconsistent data in #all-sites-by-os-and-browser for IE7.May 8 2017, 2:33 PM
Milimetric triaged this task as Normal priority.
Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.May 16 2017, 12:50 PM
Akeron added a subscriber: Akeron.Jul 22 2017, 8:02 AM
Nuria moved this task from Wikistats Production to Bots on the Analytics board.Jan 11 2018, 5:40 PM
Nuria added a comment.Jul 10 2018, 3:04 PM

Following up on this, our prior version of ua-parser was missclassifying this traffic as IE7, the traffic looks automated in nature but the true classification of the user agent has shifted from IE7 to (mostly) IE11

Following up on this, our prior version of ua-parser was missclassifying this traffic as IE7, the traffic looks automated in nature but the true classification of the user agent has shifted from IE7 to (mostly) IE11

For the record, the details are at T193578#4238244 ff.

Zebulon84 closed this task as Resolved.Jul 15 2018, 6:56 AM

I guess this task can be closed now, it seems fixed.