Page MenuHomePhabricator

superset not showing data after 09/16 for some datasources
Closed, ResolvedPublic5 Estimated Story Points

Event Timeline

Example dashboard with issue using eventlogging navigation timing data: https://bit.ly/2pt2qhf

Note that this data is on turnilo (search satisfaction is having the same issue) :

Screen Shot 2019-10-02 at 12.20.52 PM.png (1×2 px, 614 KB)

Screen Shot 2019-10-02 at 12.27.05 PM.png (1×2 px, 373 KB)

Size of segments looks OK which is what you would expect if data appears on turnilo fine.

According to SAL: 2019-09-17 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates... given that last visible navigation timing data is 09/16 this seems related. Let's see what is different about those queries that are not returning data

EBernhardson renamed this task from superset now showing data after 09/16 for some datasources to superset not showing data after 09/16 for some datasources .Oct 2 2019, 8:03 PM

I see errors like:

io.druid.java.util.common.RE: Failure getting results for query[7e7c183d-4d16-430e-8e53-ed88fd072125] url[http://druid1003.eqiad.wmnet:8200/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Faulty channel in resource pool]

on druid log that might be totally unrelated, it seems that issue shoudl be on superset end cause turnilo is able to see the data

I this segment metadata not available to druid somehow. Handy script to get segment metadata:

curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d '{

"queryType":"segmentMetadata",
"dataSource":"event_navigationtiming",
"intervals":["2019-09-01/2019-10-01"]
  }'

The query that does not return data on superset returns data on druid:

nuria@druid1001:~$ more test-query-navigationtiming.sh
curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d '{

"queryType": "timeseries",
"dataSource": "event_navigationtiming",
"aggregations": [
  {
    "type": "count",
    "name": "count"
  }
],
"granularity": {
  "type": "period",
  "timeZone": "UTC",
  "period": "P1D"
},
"postAggregations": [],
"intervals": "2019-09-20T00:00:00+00:00/2019-10-02T00:00:00+00:00"

}'

This *seems* (no proof) a permit issue accessing some segments in druid

I think also the daily rollup of eventlogging data might not be happening since 09/29? (indexing might be happening hourly but needs to happen daily too)

I see errors like:

io.druid.java.util.common.RE: Failure getting results for query[7e7c183d-4d16-430e-8e53-ed88fd072125] url[http://druid1003.eqiad.wmnet:8200/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Faulty channel in resource pool]

Restarted the druid broker on druid1003 (that is used by Superset) and I can now see data up to the 30th in https://bit.ly/2pt2qhf

Found some old occurrences (~1 month ago) of the same error on druid1001's broker, but not on druid1002. Interestingly, 1001 and 1003 are the only ones that UIs contact (turnilo 1001, superset 1003). I can see some similar issues reported by druid users, so I think that we may have hit a rare Druid bug. We didn't see the issue with Turnilo since the broker that it contacts (on 1001) didn't show the problem, but the one the Superset uses (1003) did.

Nuria set the point value for this task to 5.

@elukey I was soooo happyy when i saw this comment this morning cause i had totally run out of ideas.