Page MenuHomePhabricator

Turnilo split thresholds too low
Closed, InvalidPublic

Description

Looking at the event_navigationtiming dataset, if I split a query per "Event Origin Country", only the 50 countries with the highest count are displayed. I can't find a way to override that limit in the UI. When looking at the generated Druid query, we can see that it's Turnilo that added that limit:

      "metric": "count",
      "threshold": 50
    }
  ]
]

Having a split limit of 50 makes it hard to look at data exhaustively for all countries. It requires either going country per country or adding all the countries that come up in the first query results as a filter for the next one, etc.

Short of being able to control the limit in the UI, it would be nice if the split limit, at least for this dataset, could be increased to 200, in order to allow splitting by country and seeing all the possible ISO country codes in the results.

Likewise, when doing a split on any of the bucketed time values, like "Event Load Event End Buckets", Turnilo set the limit to 5:

      "metric": "count",
      "threshold": 5
    }
  ]
]

For that particular set of buckets, there are actually 11 possible values (as can be seen when you add that field as a filter). Only ever displayed the top 5 in a split for something that can have 11 different values is limiting too.

I searched around in our code repositories and I couldn't find where those limits are defined. If this configurable for the various fields in that schema?

Event Timeline

I have found this issue: https://github.com/allegro/turnilo/issues/472

From my tests I can get up to 100 values, but it depends on the dimensions by which the cube is split, and the chart format.
I have managed to get 100 results in table mode, with an additional split by time, but couldn't get that using charts.
Also, the values for split are hard-coded, as mentioned in the issue.
It seems that the issue is getting prioritized, but it involves complexities as Turnilo requests Druid in multi-queries way (first get the top, then query timeseries one by one from the top values).

Ah, yes, 100 is hardcoded, so I guess we'll see 100 countries at least. Thanks for that link, it let me find the drop-down menu that I didn't know existed to override the default split limit picked. 100 countries is probably good enough for now, and it's going to let me see all the buckets for Load Event End.