Hi! Here's my request for the new creds for stat100* and notebook100*, please. Username: andyrussg. Thanks so much for working on this!!!!! :)
Wed, Dec 4
Mon, Dec 2
Just to note, we have the same problem for the new CentralNotice data pipeline, which uses EventLogging, as compared to the old pipeline, which uses a custom call to beacon/impression not blocked by AdBlock. In case it's useful: see T236834#5696044 (and the two comments after that).
Windows for log samples:
Wed, Nov 27
Differences found in orphaned old pipeline events:
- 28% orphaned GET requests vs. 10% overall GET requests
- 63% orphaned Windows requests vs. 39% overall Windows requests
Just did one-to-one merges using web request logs in Hive, in both directions, using fairly large samples in both cases.
Mon, Nov 25
Here are some results:
Thu, Nov 21
could you give some examples of issues you expect to see and troubleshoot (maybe some tickets from the past?)?
Wed, Nov 20
Hi all! Congrats to all for your work on this...
Tue, Nov 19
Mon, Nov 18
Windows for log samples:
Thu, Nov 14
Tue, Nov 12
Here's a summary of the situation regarding discrepancies between the new and old pipelines for Landing Pages (includes also measures obtained in T235284).
Mon, Nov 11
FRUEC currently accepts LandingPage events with no language property in the JSON input. However, if the property exists and its value is an empty string, the event is marked invalid and not counted. However, in such a case, the legacy system (DjangoBannerStats) defaults the language to 'en'.
Nov 8 2019
Nov 6 2019
Nov 4 2019
Matching on landing page, country, language, and other event fields, with old log timestamps always earlier than the new log timestamps, by at most 30 seconds, we get:
Trying different options for better matching... Just a small improvement by allowing new log timestamps that are closest to the new old log ones, that is, removing the requirement that the new log event be after the old log one. With this method, we get 136 unmatched events in new log, and 510 in the old one.
I've dug into this some more, improving filtering compared to what was done for T235284. I've got some more clarity about what the differences are, but it's still pretty ugly.
Nov 1 2019
Oct 31 2019
Scheduled to deploy in a few minutes...
Great, thank you everyone! @GoranSMilovanovic Can you now get the impression data or do you need anything else?
Thanks so much for flagging this, and many apologies for the noise!! There's now a patch in review for T236627: CentralNotice: Adapt impression event schema for campaign fallback.
Here's a patch!!! Apologies for the trouble...!
Oct 30 2019
Oct 29 2019
Note: the second bullet point from the task description has been spun out as T236845.
This task was about the large-scale discrepancy, which I think we can consider to be solved. There are still smaller unexplained differences between data we're getting from the old and new pipelines. I've created new tasks to investigate that: T236835 and T236834.
Oct 28 2019
I've dug deeper into the Landing Page discrepancy, comparing sequences of log entries from the same IPs in both old and new pipelines.
Oct 27 2019
Oct 25 2019
So actually it seems the problem is not duplicate entries in the old logs, but rather a few IP addresses hammering on the site using some sort of script, which doesn't run client-side code, so it doesn't generate any client-side events.
Made some progress on figuring out what the difference is between old and new logs. It looks like there are a lot of duplicate entries in the old log files.