Page MenuHomePhabricator

Bot Identification: Inconsistent data in #all-sites-by-os-and-browser for IE7
Closed, ResolvedPublic3 Estimated Story Points

Description

On https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os-and-browser the share of IE7

  • is increasing regularly which is surprising,
  • includes a majority on OS Windows 7 and Windows 8, both OS incompatible with IE7.

These user agents should be either considered as other or invalid.

By the way is it a percentage of page views, daily unique devices, monthly unique devices, http request... ?

Event Timeline

This may happen, because newer versions of IE still run in compatibility mode when detecting old* html syntax. In this mode the user agent sent in the headers may be the one of IE7.

https://msdn.microsoft.com/en-us/library/ms537503(v=vs.85).aspx

For example, if you're using Internet Explorer 9 to view a webpage in Compatibility View, the version token is, by default, MSIE 7.0.

(*) It seems that even with modern html code, IE can enter compatibility mode:
http://stackoverflow.com/questions/13284083/ie10-renders-in-ie7-mode-how-to-force-standards-mode

Then is it possible for WikiMedia to turn off this compatibility mode with the « header('X-UA-Compatible: IE=edge'); » solution suggested on the stackoverflow link ?

@Zebulon84: couple things come to mind.

  1. we have to prove the theory that is the compatibility mode driving the number of IE7 requests
  1. adding a header like that one might also cause js executing issues, I would open a ticket for that and follow up on specifics with mediawiki developers

See IE usage, indeed IE7 seems to be increasing when compuing daily measures

If compatibility mode is triggered we would expect wikipedia to be in the IE compatibility list: https://msdn.microsoft.com/en-us/library/gg622935(v=vs.85).aspx

and it is there for IE10 but it says "emulateIE10' : http://cvlist.ie.microsoft.com/ie10/iecompatviewlist.xml
and same for IE11: https://iecvlist.microsoft.com/wpie11/1403264460/iecompatviewlist.xml

Entry is like: "<domain docMode="EmulateIE10" uaString="10">wikipedia.org</domain>"

So even on compatibility mode UA doesn't look like it is IE7, there might be other reasons why compatibility mode is triggered though.

Looks like edge does not have a compatibilty list.

Our stats indicate that IE7 usage increases in Windows 7. However for Ie8 and IE9 usage is lowering in all platforms. And given that our usage of windows 7 is stable overall it seems that traffic from ie8 and ie9 in windows 7 is shifting towards being targeted as IE7.

This requests are real and are happening (mostly) for Main_Page. Could be a bot or it could be an issue negotiating ssl again

Milimetric set the point value for this task to 3.

After analyzing one hour of traffic requests are coming from mostly India/Iran/Pakistan/Afghanistan and they are all requests from Main_Page, this is again some kind of ping-keep-alive seems like but why the Ie7 UA?

Adding to our task about identifying bot traffic

See also {T157404}, excerpt from there:

Pakistan Wikimedia pageviews by browser family, July 2015-January 2017.png (832×1 px, 107 KB)

Updating and extending Nuria's chart from above (global IE pageviews by version over time since mid 2015), it looks like this is still on the rise.
Assuming that there is no reason for IE7 traffic to rise naturally, we may be looking at at least 60 million extraneous pageviews per week currently, or about 1.7% of our total non-bot traffic - enough to prioritize this among the bot identification work, I would say.

IE Wikimedia pageviews by version, July 2015-January 2017.png (790×1 px, 107 KB)

(Source: Pivot)

Milimetric renamed this task from Inconsistant data in #all-sites-by-os-and-browser fot IE7 to Bot Identification: Inconsistent data in #all-sites-by-os-and-browser for IE7.May 8 2017, 2:33 PM
Milimetric triaged this task as Medium priority.

Following up on this, our prior version of ua-parser was missclassifying this traffic as IE7, the traffic looks automated in nature but the true classification of the user agent has shifted from IE7 to (mostly) IE11

Following up on this, our prior version of ua-parser was missclassifying this traffic as IE7, the traffic looks automated in nature but the true classification of the user agent has shifted from IE7 to (mostly) IE11

For the record, the details are at T193578#4238244 ff.

I guess this task can be closed now, it seems fixed.