Page MenuHomePhabricator

New scripts to ingress data from Kafkatee into MySQL
Open, Needs TriagePublic4 Story Points

Subscribers
Assigned To
Authored By
AndyRussG, May 25 2018

Description

As per T192839, for sampled impression and landing page data we won't ingress data directly from the Kafka topic in to the database, but rather will write files from the stream and will read those, as in the legacy system.

However, the format of the new files is pretty different from the old ones. Also, the legacy python scripts that processed data in the old format are pretty crufty. So, instead of writing new code to re-create the legacy format and feed it to the crufty legacy scripts, we'll re-do the legacy scripts to read the new format.

This should make the system more maintainable and stable, so it's definitely within scope for this switchover.

We may wish to make some minor changes in the database schema, but we should ensure that queries currently used will continue to work.

Thanks!!

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 455189 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Add LandingPage test data

https://gerrit.wikimedia.org/r/455189

Change 455869 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Add landingpage event processing

https://gerrit.wikimedia.org/r/455869

Change 456434 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Truncate strings to DB column limit

https://gerrit.wikimedia.org/r/456434

Change 456664 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Refactor stats output

https://gerrit.wikimedia.org/r/456664

Change 456672 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Print a friendly message confirming config

https://gerrit.wikimedia.org/r/456672

Change 457090 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Implement purge-incomplete for landingpage and output stats

https://gerrit.wikimedia.org/r/457090

Change 457091 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Rename package fr_user_event_consumer -> fruec

https://gerrit.wikimedia.org/r/457091

Change 457272 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Add event_type arg to log_file_mapper.get_lastest_time()

https://gerrit.wikimedia.org/r/457272

Change 457275 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] [WIP] Inline doc and comments

https://gerrit.wikimedia.org/r/457275

Change 457691 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Flag SQL template constants as private

https://gerrit.wikimedia.org/r/457691

Change 455189 merged by Ejegg:
[wikimedia/fundraising/FRUEC@master] Add LandingPage test data

https://gerrit.wikimedia.org/r/455189

@Ejegg I'm seeing a patch merged here. Is this task Pending Deployment?

@Ejegg I'm seeing a patch merged here. Is this task Pending Deployment?

Hi! That's just the first patch in the series of patches. That patch itself was pretty isolated, but I'd suggest review be performed on the last in the series. Thanks!!!!

Change 463110 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Remove testing field (for banner previews) from CNEvents

https://gerrit.wikimedia.org/r/463110

Change 516062 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Move object cache to separate submodule

https://gerrit.wikimedia.org/r/516062

Review should look at the current "HEAD" of the series of Gerrit patches: https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/FRUEC/+/516062/

Thanks!!!

Also noting here locations of review comments:

Change 524101 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Fix typos in inline docs and comments

https://gerrit.wikimedia.org/r/524101

It looks like some of the tasks that have been +2'ed for code review also need to be +2 verified. I guess this is necessary since we don't have any CI running at all.

Thanks!!!

Change 455869 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Add landingpage event processing

https://gerrit.wikimedia.org/r/455869

Change 456434 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Truncate strings to DB column limit

https://gerrit.wikimedia.org/r/456434

Change 456664 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Refactor stats output

https://gerrit.wikimedia.org/r/456664

Change 456672 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Print a friendly message confirming config

https://gerrit.wikimedia.org/r/456672

Change 457090 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Implement purge-incomplete for landingpage and output stats

https://gerrit.wikimedia.org/r/457090

Change 457091 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Rename package fr_user_event_consumer -> fruec

https://gerrit.wikimedia.org/r/457091

Change 457272 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Add event_type arg to log_file_mapper.get_lastest_time()

https://gerrit.wikimedia.org/r/457272

Change 457691 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Flag SQL template constants as private

https://gerrit.wikimedia.org/r/457691

Change 457275 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Inline doc and comments

https://gerrit.wikimedia.org/r/457275

Change 463110 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Remove testing field (for banner previews) from CNEvents

https://gerrit.wikimedia.org/r/463110

Change 516062 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Move object cache to separate submodule

https://gerrit.wikimedia.org/r/516062

Change 524101 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Fix typos in inline docs and comments

https://gerrit.wikimedia.org/r/524101

All patches in the chain are now merged ready for deployment!