Page MenuHomePhabricator

Provide a ua-parser service using the One True Wikimedia UA-Parser™
Closed, DeclinedPublic

Description

Rather than using jquery.client, it would be nice to have access to the same definitions that are used in Analytics for client-side work. Maybe Varnish configuration for analytics.wikimedia.org/ua-parser with the https://github.com/ua-parser code running behind it for the rare request that isn't yet cached? Purges might be a bit tricky to work out… maybe I shouldn't design these things in a request ticket. :-)

Nice to have, according to the conversation with James below.

Event Timeline

Jdforrester-WMF raised the priority of this task from to Needs Triage.
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF added a project: Research.
Jdforrester-WMF changed Security from none to None.
Jdforrester-WMF subscribed.

That'd be awesome. Prerequisites to me, which are largely but not entirely outside our (official) control:

  1. A caching layer in ua-parser (unless the presence of a varnish setup would provide that functionality, I guess?)
  2. A JS-or-similar version of ua-parser that supports this kind of iterative request (we may already have this. If it's a prioritised task I can talk to tobie.)
  3. Adding something to ua-parser to output in a human-readable format (JSON. again, the JS version may already do this)

@Jdforrester-WMF can you say if you guys still need this, and if so, how important it is?

Krinkle renamed this task from Provide a ua-parser service (ua-parser.wikimedia.org) using the One True Wikimedia UA-Parser™ to Provide a ua-parser service using the One True Wikimedia UA-Parser™.May 25 2016, 4:18 PM
Krinkle updated the task description. (Show Details)

ping @Jdforrester-WMF: just trying to understand how to prioritize this, where/how/why would you need to call this service?

ping @Jdforrester-WMF: just trying to understand how to prioritize this, where/how/why would you need to call this service?

We're currently using jquery.client on every page load involve any of VE, TMH, and MF (and probably others?) to make browser-specific choices.

We're no longer using the library in MW, where recently we've turned to feature-tests. Switching these uses away from a client-side script into a call to a service was my idea (so that we fix bugs with UA detection once), but the demand has now fallen by a couple of billion page views a day so… ;-)

Hm, you're not worried such a service would be slow? The fastest would be to make it serve through hyperswitch on the local domain, but that would be a good amount of work for a degraded user experience. Want me to just close the task? :)

Hm, you're not worried such a service would be slow?

Slower than what? The current experience is (trivially) a bit slow, too. An extra HTTP request with H/2 isn't that much overhead compared to executing a number of string regexs.

Slower than what? The current experience is (trivially) a bit slow, too. An extra HTTP request with H/2 isn't that much overhead compared to executing a number of string regexs.

I mean, yeah, it's regexes vs. http request + regexes. So it's one request slower than the status quo. It's fine, if you need that, we can build the service. So this ticket should be: build new hyperswitch service that uses the same UA parser we use in our data refinery. It's not a terrible amount of work, but not trivial. How urgently would you like us to prioritize this?

I was thinking HTTP request + string matches/number evaluations from a JSON object.

In terms of priority, this is just a nice-to-have tech-debt request, shouldn't be seen as urgent.

Milimetric triaged this task as Medium priority.Aug 25 2016, 3:08 PM
Milimetric updated the task description. (Show Details)

Building such a service would be pretty easy but I think the main premise of this ticket is false.

An extra HTTP request with H/2 isn't that much overhead compared to executing a number of string regexs.

I think data shows this is not correct, a network request is going to be (for most of the planet) slower than executing regexes. Take a look at perf metrics, a response start is 100ms at least (median 300ms): https://grafana.wikimedia.org/dashboard/db/performance-metrics. While that is for wikipedia pages and not service calls most of our requests are cached so I think response start is a good measure of the minimun time such a request would take overall at percentile 50.
It is at least 1 order of magnitude greater than running client side regexes.

Also, see:
https://v8project.blogspot.se/2017/01/speeding-up-v8-regular-expressions.html

Closing but please let us know if you think we have overlooked something.

Building such a service would be pretty easy but I think the main premise of this ticket is false.

An extra HTTP request with H/2 isn't that much overhead compared to executing a number of string regexs.

Slower than what? The current experience is (trivially) a bit slow, too. An extra HTTP request with H/2 isn't that much overhead compared to executing a number of string regexs.

I mean, yeah, it's regexes vs. http request + regexes. So it's one request slower than the status quo.

I believe James' idea of a service being "better" in some regards may've come from me in some part. I don't recall exactly, but I do remember talking about this at some point. We were looking into current issues and found jquery.client cumbersome to maintain and is extremely basic, limited and has many inaccuracies (https://github.com/wikimedia/jquery-client).

When looking into alternatives, we found the much healthier ua-parser project - which Analytics uses as well. However, the main concerns were:

  1. It doesn't (yet) have a client-side implementation, only a Node.js implementation.
  2. It is fundamentally unsuitable for client-side use given it requires having a 156 KB regex file.

So the cost we're comparing is not "local execution" vs "roundtrip + remote execution". Rather, the "local" case is much more elaborate. First a roundtrip to download 156 KB of regexes + the necessary JavaScript code to parse it + UI-thread-blocking execution for a large number of regexes. Even if the request takes longer than local execution, the maximum tolerable single-threaded uninterrupted execution time on the web is 50ms. So on slower devices it would need to be broken up in multiple chunks, or moved to a WebWorker (+ the added logic to implement and maintain all that).

Given the high cost of all that, it seems very unsuitable to do client-side. If on a path to do all that, then it seems asynchronously contacting a hosted service would be much better. One that is stateless, highly-available, low-latency, and http-cached. While network latency can be significant, it would involve ~0 bytes download of data and ~0ms blocking execution and client-side complexity. In fact, it wouldn't even need to be a roundtrip. If we determine it's very important, it could even be embedded in responses in some way (like we do with Geo data already), or optimised using preload, or HTTP/2 Push, or Varnish ESI, or through a Service Worker.

However I do agree with closing this task because the reality is that we don't need 100% accuracy for the involved use cases. The main use case (deciding to load or not load a feature) does not need to know exactly what browser the user has. We generally only operate on a whitelist or blacklist, so it only needs to be good at identifying that handful of browsers ("is a whitelisted one" / "is not a blacklisted one"). And that can be maintained in jquery.client as a small set of regexes, like it is now.