Page MenuHomePhabricator

event_pageissues Turnilo view contains no valid data from before January 5
Closed, ResolvedPublic

Description

The PageIssues EventLogging data imported into Druid in T202751 seems almost entirely missing when viewed in Turnilo:

event_pageissues Turnilo view.png (864×1 px, 75 KB)

In Superset I keep getting a "No data was returned" error when trying to access the same data, so it's probably an issue affecting the underlying Druid datasource.

Event Timeline

mforns triaged this task as Unbreak Now! priority.

@Tbayer Thanks for the heads up. I think I know what happened.

On January 5th, I deployed a new feature of HiveToDruid, the job that loads data from EventLogging to Druid, and Turnilo. And together with it, I deployed another fix that was asked by a couple people: unifying the 2 confusing metrics that the job would generate ("Count" and "Event Count") into one single metric.

To do that, I had to rename "Event Count" to "Count", to override the latter, which is added by default by Turnilo. When I finished the tests and the deployment, both metrics were still visible for the time period prior to Jan 5th, and thus all data was queryable. However, I believe after some days have passed, Turnilo is not looking at the old data any more when applying introspection, and has dropped the old "Event Count" metric. Without it, the old data, even if it's still there, can not be queried from Turnilo...

I will launch backfill jobs to load the last 3 months of data with the new code. This will fix the problem, but might take a couple days.

The 2 previous metrics were confusing, and only the latter was used. The only way that I found to remove the non-useful "Count" metric, which is added by default by Turnilo, was overriding it with the EventCount value

@Tbayer Thanks for the heads up. I think I know what happened.

On January 5th, I deployed a new feature of HiveToDruid, the job that loads data from EventLogging to Druid, and Turnilo. And together with it, I deployed another fix that was asked by a couple people: unifying the 2 confusing metrics that the job would generate ("Count" and "Event Count") into one single metric.

Yes, I was among those people and greatly appreciate that fix ;)

To do that, I had to rename "Event Count" to "Count", to override the latter, which is added by default by Turnilo. When I finished the tests and the deployment, both metrics were still visible for the time period prior to Jan 5th, and thus all data was queryable. However, I believe after some days have passed, Turnilo is not looking at the old data any more when applying introspection, and has dropped the old "Event Count" metric. Without it, the old data, even if it's still there, can not be queried from Turnilo...

I see - does that also explain the "No data was returned" error in Superset?

I will launch backfill jobs to load the last 3 months of data with the new code. This will fix the problem, but might take a couple days.

Thanks! Could we include a slightly longer timespan? This is basically data from a one-time experiment that ran from September to November. (The data that is currently still coming in is doing so only at a very low rate, and could actually be discarded if necessary.)

The 2 previous metrics were confusing, and only the latter was used. The only way that I found to remove the non-useful "Count" metric, which is added by default by Turnilo, was overriding it with the EventCount value

I see - does that also explain the "No data was returned" error in Superset?

I'm not sure, it fails even with latest data, that's not a good sign. Will troubleshoot this as part of this task.

Thanks! Could we include a slightly longer timespan? This is basically data from a one-time experiment that ran from September to November. (The data that is currently still coming in is doing so only at a very low rate, and could actually be discarded if necessary.)

Yes, I configured automatic deletion in Turnilo after 3 months as a default for EventLogging schemas loaded into Druid, for privacy reasons. However, I see that both event_pageissues and event_readingdepth are not privacy-sensitive, so we can keep them for longer.

@Tbayer So, you think we can stop the Druid/Turnilo ingestion of event_pageissues then? And leave the data of the September-November experiment? If so, I will backfill since September and remove the ingestion job. How about event_readingdepth?

[...]

Thanks! Could we include a slightly longer timespan? This is basically data from a one-time experiment that ran from September to November. (The data that is currently still coming in is doing so only at a very low rate, and could actually be discarded if necessary.)

Yes, I configured automatic deletion in Turnilo after 3 months as a default for EventLogging schemas loaded into Druid, for privacy reasons. However, I see that both event_pageissues and event_readingdepth are not privacy-sensitive, so we can keep them for longer.

@Tbayer So, you think we can stop the Druid/Turnilo ingestion of event_pageissues then? And leave the data of the September-November experiment? If so, I will backfill since September and remove the ingestion job.

Yes, that's correct. Thank you!

How about event_readingdepth?

That is a different matter - the ReadingDepth schema is generating data on an ongoing basis rather for a one-time experiment (also, recall that we were still figuring out how to make the data from its quantitative fields useful in Turnilo/Superset; @jlinehan is working on solutions for this).

Change 486094 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] Deactivate EL Druid loading job for PageIssues schema

https://gerrit.wikimedia.org/r/486094

Change 486094 merged by Elukey:
[operations/puppet@production] Deactivate EL Druid loading job for PageIssues schema

https://gerrit.wikimedia.org/r/486094

Change 486128 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] Normalize field names in eventlogging_to_druid_job calls

https://gerrit.wikimedia.org/r/486128

Change 486128 merged by Elukey:
[operations/puppet@production] Normalize field names in eventlogging_to_druid_job calls

https://gerrit.wikimedia.org/r/486128

@Tbayer
This is finished! Please check that pageissues and readingdepth contain the data that you expect.
Thanks for spotting this issue!

@Tbayer
This is finished! Please check that pageissues and readingdepth contain the data that you expect.
Thanks for spotting this issue!

Pageissues looks great on Turnilo now, thank you!

In Superset though, I still can't seem to get rid of that "No data was returned" error - any idea what's wrong?

In Superset though, I still can't seem to get rid of that "No data was returned" error - any idea what's wrong?

Yes, the time range.
That data only spans up October 1st to November 1st I think.