Page MenuHomePhabricator

Traffic stats: analyze category 'Linux other'
Closed, ResolvedPublic

Description

Hi Erik,

I had a look at the Wikimedia Stats on the differentLinux distributions, and I notice that "Linux Other" has asignificant percentage, http://stats.wikimedia.org/wikimedia/squids/SquidReportOperatingSystems.htm

Is there an explanation for the high percentage for LinuxOther?
Could it be other Squid proxies at Universities that runon Linux, and they interfere with the User Agent?

The Wikimedia stats are very important to understand thepopularity of Linux distributions, and it would be great to figure out thenature of Linux Other.

Thanks,
Simos


Erik Zachte Linux Other is 0.82% which IMO makes this bug medium priority at best, also because there is no direct relevance for our own processes. That being said we had issues with Linux Other before, as it catches all Linux events which are not caught elsewehere and thus probably grows over time, and is a trigger for maintenance.

Maybe an error, maybe not


Version: unspecified
Severity: normal

Details

Reference
bz46190

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:21 AM
bzimport set Reference to bz46190.
bzimport added a subscriber: Unknown Object (MLST).

To be addressed/reviewed once HADOOP cluster is fully functional.

Comment from Jef Spaleta: (sent to me directly because he had sign-in issues)

The primary reason why unknown linux is so large in the wikimedia stats is because both Mozilla Firefox and Google Chrome stopped putting in Linux vendor strings into the default UA as part of upstream policy. I very much doubt squid proxies are the problem. The UA strings are the problem, in that, they do not reliably encode linux vendor information in the most popular web browsers in use right now.

For example...
On an up2date install of Centos 5.x right now firefox gives

Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0

On an up2date install of Centos 6.x right now firefox gives

Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0

Fedora also had several releases where Mozilla firefox did not have a linux vendor string. Mozilla has since reverted that upstream policy and for Fedora 20 at least currently firefox does list Fedora again in the vendor string.I bet if you went back and trended the last couple of years worth of Fedora metrics, you would see a sharp up tick which would indicate exactly when in time firefox started listing the vendor string again in fedora.

Chrome however.. on all linux distributions in my testing including Ubuntu... is devoid of linux vendor information. So as as Chrome become more popular "unknown linux" counts will continue to rise accordingly. I've checked this on multiple leading distributions now using both Google's rpm and deb packages.

I have not checked chromium binaries as provided directly by linux vendors yet.

Moreover its very difficult to reliably make credible statements concerning the relative use of Ubuntu and Ubuntu-derived distributions such as Mint. For example Mint 15,16 and 17 all use firefox packages built by Ubuntu and as a result all Mint users who use firefox will show up in the wikimedia stats as Ubuntu. Honestly I have no explanation as to why you have so many Mint users in your stats. Those
6 Million counts might be from an non-default browser that I haven't tested. There are so many web browsers. But the default in Mint..which is firefox.. reports as Ubuntu for all currently supported Mint releases based on my own testing. What percentage of Ubuntu counts are actually Mint users? There's no way to know at all.

So speaking to the person who filed the bug more than you. People
should be discouraged strongly from attempting to making detailed cross linux vendor comparisons using the wikimedia stats. "unknown linux" is a systemic and unignorable reality in UA strings stats at present.

The only thing I've been able to justify based on my understanding of the default UA strings I've seen in linux distribution testing for firefox and chrome is the following:

  1. You can divide the linux vendor space between Ubuntu+derivatives together and compare against everything else+unknown..and try to mitigate for chrome by assuming chrome is being used equally everywhere. But chrome usage patterns across distributions is a huge unknown. I would not advise to make any public statements if you do this. It's just not credible analysis.
  1. Take all the linux distributors together including unknown and compare against linux android....which so far seems to be valid based on my UA testing across linux distros and android for popular browsers.

-jef

Closing, stats are available at https://analytics.wikimedia.org/dashboards/browsers/#desktop-site-by-os but given the shift to our trafic to mobile there is little data for linux distros