Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Nuria | T234471 superset not showing data after 09/16 for some datasources | |||
Invalid | elukey | T234494 Eventlogging to druid daily timers not executing? |
Event Timeline
Example dashboard with issue using eventlogging navigation timing data: https://bit.ly/2pt2qhf
Note that this data is on turnilo (search satisfaction is having the same issue) :
Size of segments looks OK which is what you would expect if data appears on turnilo fine.
According to SAL: 2019-09-17 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates... given that last visible navigation timing data is 09/16 this seems related. Let's see what is different about those queries that are not returning data
I see errors like:
io.druid.java.util.common.RE: Failure getting results for query[7e7c183d-4d16-430e-8e53-ed88fd072125] url[http://druid1003.eqiad.wmnet:8200/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Faulty channel in resource pool]
on druid log that might be totally unrelated, it seems that issue shoudl be on superset end cause turnilo is able to see the data
I this segment metadata not available to druid somehow. Handy script to get segment metadata:
curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d '{
"queryType":"segmentMetadata", "dataSource":"event_navigationtiming", "intervals":["2019-09-01/2019-10-01"] }'
The query that does not return data on superset returns data on druid:
nuria@druid1001:~$ more test-query-navigationtiming.sh
curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d '{
"queryType": "timeseries", "dataSource": "event_navigationtiming", "aggregations": [ { "type": "count", "name": "count" } ], "granularity": { "type": "period", "timeZone": "UTC", "period": "P1D" }, "postAggregations": [], "intervals": "2019-09-20T00:00:00+00:00/2019-10-02T00:00:00+00:00"
}'
This *seems* (no proof) a permit issue accessing some segments in druid
I think also the daily rollup of eventlogging data might not be happening since 09/29? (indexing might be happening hourly but needs to happen daily too)
Restarted the druid broker on druid1003 (that is used by Superset) and I can now see data up to the 30th in https://bit.ly/2pt2qhf
Found some old occurrences (~1 month ago) of the same error on druid1001's broker, but not on druid1002. Interestingly, 1001 and 1003 are the only ones that UIs contact (turnilo 1001, superset 1003). I can see some similar issues reported by druid users, so I think that we may have hit a rare Druid bug. We didn't see the issue with Turnilo since the broker that it contacts (on 1001) didn't show the problem, but the one the Superset uses (1003) did.
@elukey I was soooo happyy when i saw this comment this morning cause i had totally run out of ideas.