Page MenuHomePhabricator

Gather user feedback from druid prototype for pageview data
Closed, ResolvedPublic3 Estimated Story Points

Description

Gather user feedback from druid prototype

Access prototype:

ssh -N stat1002.eqiad.wmnet -L 9090:stat1002.eqiad.wmnet:9090

and later go to http://localhost:9090

That's it

Event Timeline

First off- this is amazing. I can't wait to use this more. I used it last night to get a quick ballpark figure because using this new tool was already easier than existing methods.

  • If you don't select user as a filter, then bots are automatically included. This might be a filter that you add in automatically as the likelihood that somebody wants to include bots is more likely to be the exception.
  • In almost all cases I would use this for, I would want to download the data to run calculations in a spreadsheet or make my own charts. Without that, it loses value drastically. For example, I cannot say something like "pageviews dropped 10%" or the ratio of internal to external pageviews is .8. " Not only is a download of the table important, it should ideally be a first-order action (easily found and accessed).
  • It appears that time series aren't working for the pageview daily set, but they are working on pageview hourly.
  • It seems that when I want to get daily numbers in a table that the limit auto-selects to 5 (only shows 5 days). When I switch to unlimited, the chart switches immediately to a time series...when I switch it back, the limit is in place again. Again, this is most important for downloading data.

Thanks for the feedback, @JKatzWMF. Most of the things you mention I agree with, but they're a bit out of scope for us because Pivot is a UI developed by Imply: https://github.com/implydata/pivot. We could fork and add easy features, like download as CSV, but that does get away from what we're trying to do this quarter and next.

Right now we're more focused on building the data pipeline and working out scalability and performance issues. So I guess what we'd like to know is, how useful this way of looking at data is in general compared to direct Hive access. We have a few other out-of-the-box-UI choices for you, including AirBnB's Caravel or Saiku (which we would have to pay a small fee to integrate with Druid). You also can submit queries directly to Druid but that comes with a cost of learning their weird query specification and the only benefit is a bit better speed than Hive. We'll ping this ticket when we have Caravel set up so you can take a look.

Actually Pivot lets you download a CSV by clicking the share icon in the top right corner. I just used that to create this spreadsheet.
(And yes, I agree it's an important feature, for example because the visualization options in Pivot are unsurprisingly a bit limited, e.g. it doesn't seem to be possible to create a stacked graph like in that example.)

Agree this looks great! And so fast!

It would be nice to be able to export a SQL (or HQL) query corresponding to the current setting (filters and splits), That could enable one to refine it using Hive once one hits the limits of the Pivot interface.
I seem to recall this was possible in some form with our Pentaho/Saiku installation last year. But I guess that may not be possible, as Druid uses a different query format internally. Does anyone know?

An option for exporting the current graph (+legend) as SVG or PNG would be great too.

@Tbayer: query format is indeed very different than the one you would use in hive. Also, query times and optimizations needed are different in either data store.

Milimetric set the point value for this task to 3.Jun 28 2016, 4:04 PM