Hi all,
I am working with the event_santized.centralnoticebannerhistory data in Superset. I've done some data transformation and cleaning within Superset by creating columns that are meaningful to reflect how Fundraising teams digest campaign data. The end results can be seen in this dashboard. We have data for the FY2021 India campaign and the FY1920 en6C banner campaign working well with these created fields!
For this year's FY2021 en6C banner campaign, I am getting an error that seems to be tied to the created field 'campaign'. However, this error only appears when aggregated (not when rows are selected ungrouped), and only appears for the first two days of the en6C campaign, Nov. 30, 2020 and Dec. 1, 2020.
I have a s[[ https://superset.wikimedia.org/r/400 | eperate tab ]] of the dashboard displaying the issue. The 'Ungouped' view shows data flowing through without issue on Nov. 30 and Dec. 1; the 'Grouped' view shows the Presto error when grouping this same data. The 'Grouped, Starting Dec. 2' view is identical to the 'Grouped' view with the date range changed.
I have looked through the raw data in hive to see if I could identify any data quality errors on Nov. 30 and Dec. 1 in the event.l array, which is the original source of the 'campaign' attribute and have not been able to find anything (though would welcome ideas, of course).
Thank you for your help looking into this issue - please let me know if there is anything else I can provide!