The Edit log in the Data Lake seems to contain no data at all:
select year, month, count(*) as events from edit where year >= 0 group by year, month order by year, month asc limit 1000 Done. 0 results.
The only files in HDFS are a bunch named _REFINE_FAILED, each containing a single timestamp, from a few days in December 2017.
neilpquinn-wmf@stat1005:~$ hdfs dfs -ls -h -R /wmf/data/event/Edit Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017 drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:01 /wmf/data/event/Edit/year=2017/month=12 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=20 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=20/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=21 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=21/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=22 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=22/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=23 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=23/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:59 /wmf/data/event/Edit/year=2017/month=12/day=17 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=17/hour=0 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=17/hour=0/_REFINE_FAILED [....] drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=6 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=6/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=7 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=7/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=8 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=8/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=9 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=9/_REFINE_FAILED
However, the data does exist in MariaDB:
select left(timestamp, 8) as date, count(*) as events from `Edit_17541122` where timestamp >= "201808" group by left(timestamp, 8) date events 20180801 169194 20180802 167854 20180803 157676 20180804 130754 20180805 139192 20180806 173113 20180807 174028 20180808 172547 20180809 171772 20180810 157373 20180811 132995 20180812 142766 20180813 173578 20180814 172509 20180815 122250 20180816 109673 20180817 100810 20180818 87112 20180819 95706 20180820 117001
Any idea what's going on here?