The Edit log in the Data Lake seems to contain no data at all:
select
year,
month,
count(*) as events
from edit
where year >= 0
group by year, month
order by year, month asc
limit 1000
Done. 0 results.The only files in HDFS are a bunch named _REFINE_FAILED, each containing a single timestamp, from a few days in December 2017.
neilpquinn-wmf@stat1005:~$ hdfs dfs -ls -h -R /wmf/data/event/Edit Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017 drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:01 /wmf/data/event/Edit/year=2017/month=12 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=20 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=20/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=21 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=21/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=22 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=22/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=23 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=16/hour=23/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:59 /wmf/data/event/Edit/year=2017/month=12/day=17 drwxr-xr-x - hdfs hadoop 0 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=17/hour=0 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 20:57 /wmf/data/event/Edit/year=2017/month=12/day=17/hour=0/_REFINE_FAILED [....] drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=6 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=6/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=7 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=7/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=8 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=8/_REFINE_FAILED drwxr-xr-x - hdfs hadoop 0 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=9 -rw-r--r-- 3 hdfs hadoop 26 2017-12-20 21:02 /wmf/data/event/Edit/year=2017/month=12/day=20/hour=9/_REFINE_FAILED
However, the data does exist in MariaDB:
select
left(timestamp, 8) as date,
count(*) as events
from `Edit_17541122`
where timestamp >= "201808"
group by left(timestamp, 8)
date events
20180801 169194
20180802 167854
20180803 157676
20180804 130754
20180805 139192
20180806 173113
20180807 174028
20180808 172547
20180809 171772
20180810 157373
20180811 132995
20180812 142766
20180813 173578
20180814 172509
20180815 122250
20180816 109673
20180817 100810
20180818 87112
20180819 95706
20180820 117001Any idea what's going on here?