In querying some schemas in the data lake, I noticed that there seems to be no data in any event_sanitized schemas on four days. This is important for reports that the Growth team runs, and probably affects reporting run by other teams, too. There is no data on these dates:
- 2020-04-30
- 2020-05-01
- 2020-05-02
- 2020-05-03
Data seems to be curtailed on the bordering dates of 2020-04-29 and 2020-05-04. Could this data please be fixed, perhaps by re-inserting from the event schemas?
I checked the following tables. In all cases, the data is missing for the event_sanitized version but not for the event version:
- event_sanitized.serversideaccountcreation
- event.serversideaccountcreation
- event_sanitized.homepagemodule
- event.homepagemodule
- event_sanitized.helppanel
- event.helppanel
Below is one of the queries I ran, so you can see how I am looking at this:
SELECT substring(dt,1,10) the_date, count(*) FROM event_sanitized.serversideaccountcreation WHERE year = 2020 and month >= 4 AND wiki IN ('cswiki', 'kowiki', 'viwiki', 'arwiki', 'ukwiki', 'huwiki', 'hywiki', 'srwiki', 'euwiki','frwiki') AND event.isSelfMade = true AND event.isApi = false GROUP BY SUBSTRING(dt,1,10) ORDER BY the_date DESC LIMIT 1000;