Page MenuHomePhabricator

virtualpageview_hourly lacks data from December 17 on
Closed, ResolvedPublic

Description

See Turnilo or [1].

In contrast, the underlying auxiliary EventLogging schema continued to receive data, see Grafana or [2].

[1]
SELECT year, month, day, SUM(view_count) AS previews_seen
FROM wmf.virtualpageview_hourly WHERE ((year = 2018 AND month = 12) OR year = 2019)
GROUP BY year, month, day ORDER BY year, month, day LIMIT 100000;

year	month	day	previews_seen
2018	12	1	48439092
2018	12	2	53600166
2018	12	3	66480673
2018	12	4	65893810
2018	12	5	65204420
2018	12	6	62827187
2018	12	7	57357297
2018	12	8	47257036
2018	12	9	53448204
2018	12	10	66320025
2018	12	11	64520138
2018	12	12	62732642
2018	12	13	61147264
2018	12	14	55684608
2018	12	15	45616734
2018	12	16	50454028
2018	12	17	25671763
[2]

SELECT year, month, day, COUNT(*) AS events
FROM event.virtualpageview 
WHERE year = 2018 AND month = 12 
GROUP BY year, month, day ORDER BY year, month, day LIMIT 100000;


year	month	day	events
2018	12	1	48440945
2018	12	2	53601704
2018	12	3	66482347
2018	12	4	65895927
2018	12	5	65206322
2018	12	6	62828719
2018	12	7	57358805
2018	12	8	47258652
2018	12	9	53450012
2018	12	10	66321855
2018	12	11	64522047
2018	12	12	62734459
2018	12	13	61148818
2018	12	14	55686445
2018	12	15	45618390
2018	12	16	50455599
2018	12	17	45998595
2018	12	18	61251561
2018	12	19	58916157
2018	12	20	55093965
2018	12	21	49471128
2018	12	22	41418236
2018	12	23	44079596
2018	12	24	43429941
2018	12	25	41378813
2018	12	26	50981486
2018	12	27	54947802
2018	12	28	53041338
2018	12	29	46317580
2018	12	30	47211318
2018	12	31	43994478
Time taken: 129.095 seconds, Fetched: 31 row(s)

Event Timeline

This is affecting the monthly core readership metrics we are currently preparing for the board and the mw.org Audiences page, so a quick initial assessment (to see whether/how this could be fixed right away) would be valuable.

fdans triaged this task as Unbreak Now! priority.

@Tbayer we will be working on seeing why this job stopped (it is missing a few hours arround september 17th) in the meantime to approximate data if this is urgent you can use eventlogging requests which are there as you mentioned.

Mentioned in SAL (#wikimedia-analytics) [2019-01-14T16:40:21Z] <ottomata> running refine eventlogging analytics for dec 17 2018 12:00 - 16:00 - T213602

Thanks for filing Tilman, I'm refining this data now, and Oozie is scheduling the jobs now:
https://hue.wikimedia.org/oozie/list_oozie_coordinator/0001715-180905070129339-oozie-oozi-C/

This was my fault. In T211833, I ran a large re-refine job that caused the normally scheduled refine jobs to be delayed. Our scheduling currently only searches for un-refined data in the last 26 hours. The large job I ran caused this search to be delayed beyond that 26 hour periods. When it finally ran, it 26 hours ago was Dec 17 17:00. So there were 5 hours missing for all refined EventLogging data on that day. The VirtualPageView hour job was stuck waiting for that data to be available before moving on.

Looking back now, we did get an email alert about this, but none of us took action (we'll talk more internally about why...). This is 100% an oversight on the part of myself and the analytics team! Very sorry Tilman.

Data is present now up to the 22nd.

I think is taking about 3 minutes per hour, which means it should finish by midnite today, January 14th.

Great, thank you @Ottomata and everyone else for solving this so quickly!

Data is present now up to the 22nd.

I think is taking about 3 minutes per hour, which means it should finish by midnite today, January 14th.

I only see data until - part of - January 6 right now (5am PT on January 15). And indeed 3 minutes per hour would seem to come out at a running time of more than a day (3 min/h * 24 h/day * (8 + 13) days = 25.2 hours ). That said, it seems we can trust that it has been running sequentially, i.e that the December data we need for the report is complete already.

It finished by midnite the December data, which is the one you needed for the report.