Page MenuHomePhabricator

Stat1005 BH data issue?
Closed, ResolvedPublic

Description

Hi Andrew and all,

I just want to flag that we haven’t been able to pull BH (event logging) data from stat1005 this evening (we tried between roughly 3:30 and 5:00 AM UTC on 12/1). When we generated the 6 hour mobile reports earlier today (around 22:00 UTC on 11/30), this was not a problem and everything on BH reports looked fine then. I took a quick look on pivot, where banner activity data for the latest hour exists. I thus wonder whether there might be a data transfer/storage issue on stat1005.

Based on the monitoring report, everything with banner delivery seems to be OK as impressions and donations have been keeping up at a similar rate to before.

I would love to get your thoughts on this!

Thank you and best,
Max

Event Timeline

DStrine created this task.Dec 1 2017, 3:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 1 2017, 3:51 PM

I expect this was due to maintenance and minor issues on the Analytics cluster around this time, though I don't have a confirmation of that for the exact time of the reported issue. @Mpany has this happened again recently (i.e. within the past month or so)? Thanks!! :D

@Mpany any updates here?

Mpany added a comment.Mar 6 2018, 3:27 AM

@AndyRussG @DStrine
This hadn't happened again until last week (2/25/2018). For the past week, we have not been able to pull BH data from stat1005 for the current SE campaign. We tried on different days and at different times. Based on impression and donation data from frdev1001, the SE campaign has been delivering banners and generating donations at the rate we would expect, so this may be a data transfer/storage issue again, possibly related to the above.

Of note, we have had no problems pulling BH data from stat1005 for the current IT campaign. However, it looks like for itIT donations are strangely high on 2/28/2018 at 11:00 UTC at around 240,000 while they are around 1,500 before and after. This seems like a mistake and I wonder whether this is a transcription error or whether somehow donations from other hours got attributed to this day and time.

Nuria added a subscriber: Nuria.EditedMar 8 2018, 8:31 PM

Sorry, it is hard to understand what is happening here. Can someone describe issue in detail?

EL has been migrated to a new box as of yesterday what caused a small data gap of < 10 mins but seems like you have problems since 2/25?

Are users subscribed to analytics@ e-mail list? See notices about otages: https://lists.wikimedia.org/pipermail/analytics/2018-March/006218.html

Hi! @Mpany, thanks for these reports. It would be helpful to have more details...

Can you please put here what queries you are running, on which machines, to try to get the SE campaign banner history data? Or perhaps link to the code, and indicate how it's being run and what parameters it's being run with?

Also, could you please indicate what error messages you're getting, or specifically how the queries are failing?

Finally, is the problem still happening, and if not, can you please post at least a few specific times when it did occur?

Just to understand about the IT campaign--so, you're running the same code and almost the same queries for that campaign, but that for that campaign you are receiving data? Is that the case?

However, it looks like for itIT donations are strangely high on 2/28/2018 at 11:00 UTC at around 240,000 while they are around 1,500 before and after. This seems like a mistake and I wonder whether this is a transcription error or whether somehow donations from other hours got attributed to this day and time.

This sounds like a separate issue, not related to Banner History EventLogging data? Or at least, not related to the outage described in this task? In that case, I'd suggest we create a separate task...

Thanks much!!!!! :D

@Nuria Thanks a lot for the link to the EL MySQL maintenance info... Banner History data is not pulled from MySQL (I think it's not actually stored there)... it's only ever queried via Hive. :)

Of note, we have had no problems pulling BH data from stat1005 for the current IT campaign. However, it looks like for itIT donations are strangely high on 2/28/2018 at 11:00 UTC at around 240,000 while they are around 1,500 before and after. This seems like a mistake and I wonder whether this is a transcription error or whether somehow donations from other hours got attributed to this day and time.

splitting this into a new task: T189697

@Mpany do you have any more information for us?

Aklapper removed AndyRussG as the assignee of this task.Jun 19 2020, 4:12 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Nuria closed this task as Resolved.Jun 19 2020, 4:47 PM