Page MenuHomePhabricator

Queries and maybe scripts to verify equivalence of data in new-Kafka-pipeline-testing and pgehres production databases
Open, MediumPublic2 Story Points

Subscribers
Assigned To
Authored By
AndyRussG, Jul 3 2018

Description

See T196564 for testing database schema, and T196563 for info on what we need to check.

Details

Related Gerrit Patches:
wikimedia/fundraising/FRUEC : masterFRUEC & Legacy impression data comparison script

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hey Andy, could we chat about this when convenient?

Hey Andy, could we chat about this when convenient?

For sure! Let's chat on IRC to find a time, sound good? :) Thanks much!!!!!!!!!!!!!!!!!!!!!!!

jgleeson claimed this task.Aug 13 2018, 1:25 PM

Location of the log files to be consumed into the database in the new system: alnitak:/srv/banner_logs/2018/. (There's a lot in that directory... To find some, try: ls /srv/banner_logs/2018/ | grep -v beacon | tail -n 10

Change 456016 had a related patch set uploaded (by Jgleeson; owner: Jgleeson):
[wikimedia/fundraising/FRUEC@master] WIP: Backend Stats Comparison Script (bannerimpressions part)

https://gerrit.wikimedia.org/r/456016

After talking on this with Andy, it was suggested that a summary output showing matching/non-matching lines count data between backends would be useful.

FRUEC & Legacy impression data comparison script patch pushed with usage notes: https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/FRUEC/+/456016/

For testing of the script itself (to confirm it works as expected before testing on live data) I have added some sample SQL for the legacy, and FRUEC databases on the tech google drive folder under the path fr-tech-things/tmp/