⚓ T195594 New scripts to ingress data from Kafkatee into MySQL

Subject	Repo	Branch	Lines +/-
Fix typos in inline docs and comments	wikimedia/fundraising/FRUEC	master	+3 -3
Move object cache to separate submodule	wikimedia/fundraising/FRUEC	master	+26 -22
Remove testing field (for banner previews) from CNEvents	wikimedia/fundraising/FRUEC	master	+1 -9
Inline doc and comments	wikimedia/fundraising/FRUEC	master	+428 -44
Flag SQL template constants as private	wikimedia/fundraising/FRUEC	master	+71 -46
Add event_type arg to log_file_mapper.get_lastest_time()	wikimedia/fundraising/FRUEC	master	+13 -4
Rename package fr_user_event_consumer -> fruec	wikimedia/fundraising/FRUEC	master	+36 -37
Implement purge-incomplete for landingpage and output stats	wikimedia/fundraising/FRUEC	master	+70 -16
Print a friendly message confirming config	wikimedia/fundraising/FRUEC	master	+35 -12
Refactor stats output	wikimedia/fundraising/FRUEC	master	+56 -33
Truncate strings to DB column limit	wikimedia/fundraising/FRUEC	master	+54 -2
Add landingpage event processing	wikimedia/fundraising/FRUEC	master	+652 -260
Add LandingPage test data	wikimedia/fundraising/FRUEC	master	+120 -0

		Status	Subtype	Assigned	Task
		Open		None	T183978 [Epic] Fundraising kafkatee changes
		Resolved		AndyRussG	T195594 New scripts to ingress data from Kafkatee into MySQL

AndyRussG mentioned this in rWMFR34d4182c2d84: Print a friendly message confirming config.Aug 31 2018, 11:54 PM

AndyRussG mentioned this in rWMFR03d24d09e2b8: Print a friendly message confirming config.Sep 1 2018, 5:48 PM

Change 457090 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Implement purge-incomplete for landingpage and output stats

https://gerrit.wikimedia.org/r/457090

AndyRussG mentioned this in rWMFRec426f4db0c3: Implement purge-incomplete for landingpage and output stats.Sep 2 2018, 2:19 PM

Change 457091 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Rename package fr_user_event_consumer -> fruec

https://gerrit.wikimedia.org/r/457091

AndyRussG mentioned this in rWMFR2379579205c0: Rename package fr_user_event_consumer -> fruec.Sep 2 2018, 2:44 PM

Change 457272 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Add event_type arg to log_file_mapper.get_lastest_time()

https://gerrit.wikimedia.org/r/457272

AndyRussG mentioned this in rWMFR31123b2bdc26: Add event_type arg to log_file_mapper.get_lastest_time().Sep 3 2018, 4:31 AM

Change 457275 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] [WIP] Inline doc and comments

https://gerrit.wikimedia.org/r/457275

AndyRussG mentioned this in rWMFR7c26a52a8a01: [WIP] Inline doc and comments.Sep 3 2018, 4:39 AM

Change 457691 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Flag SQL template constants as private

https://gerrit.wikimedia.org/r/457691

AndyRussG mentioned this in rWMFRbf97bd17c9b4: Flag SQL template constants as private.Sep 3 2018, 10:02 PM

AndyRussG mentioned this in rWMFR58ccf4de7602: [WIP] Inline doc and comments.

AndyRussG moved this task from Doing to Review on the Fundraising Sprint Queue is pronounced GJif board.Sep 4 2018, 7:31 PM

AndyRussG mentioned this in rWMFR86811bb55da7: Inline doc and comments.Sep 4 2018, 7:35 PM

• DStrine added a project: Fundraising Sprint Raw data can give you salmonella.Sep 4 2018, 8:37 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Raw data can give you salmonella board.Sep 5 2018, 1:54 AM

AndyRussG mentioned this in rWMFRfe75b131890f: Inline doc and comments.Sep 5 2018, 4:39 PM

AndyRussG mentioned this in rWMFR73c31d9fdacb: Inline doc and comments.Sep 6 2018, 7:12 PM

Change 455189 merged by Ejegg:
[wikimedia/fundraising/FRUEC@master] Add LandingPage test data

https://gerrit.wikimedia.org/r/455189

@Ejegg I'm seeing a patch merged here. Is this task Pending Deployment?

In T195594#4582085, @mepps wrote:

@Ejegg I'm seeing a patch merged here. Is this task Pending Deployment?

Hi! That's just the first patch in the series of patches. That patch itself was pretty isolated, but I'd suggest review be performed on the last in the series. Thanks!!!!

• DStrine added a project: Fundraising Sprint Sasquatches can't find us either.Sep 18 2018, 8:37 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Sasquatches can't find us either board.Sep 18 2018, 9:20 PM

Change 463110 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Remove testing field (for banner previews) from CNEvents

https://gerrit.wikimedia.org/r/463110

AndyRussG mentioned this in rWMFR66c87389c6c4: Remove testing field (for banner previews) from CNEvents.Sep 26 2018, 5:27 PM

• DStrine added a project: Fundraising Sprint They Live.Oct 2 2018, 8:12 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint They Live board.Oct 3 2018, 2:40 PM

• DStrine added a project: Fundraising Sprint USB stands for underhanded socket bureaucracy.Oct 16 2018, 8:29 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint USB stands for underhanded socket bureaucracy board.Oct 17 2018, 2:53 PM

AndyRussG mentioned this in rWMFR1ce2a93e79cd: Remove testing field (for banner previews) from CNEvents.Oct 23 2018, 7:25 PM

• DStrine added a project: Fundraising Sprint Vestigial tails shoot from the hip.Oct 30 2018, 8:42 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Vestigial tails shoot from the hip board.Oct 31 2018, 4:18 PM

• DStrine added a project: Fundraising Sprint Window dressing is mostly olive oil.Nov 13 2018, 9:15 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Window dressing is mostly olive oil board.Nov 20 2018, 8:35 PM

• DStrine added a project: Fundraising Sprint XML ate my homework.Dec 11 2018, 9:14 PM

Ejegg moved this task from Backlog to Review on the Fundraising Sprint XML ate my homework board.Dec 11 2018, 10:40 PM

• DStrine added a project: Fundraising Sprint A series of unfortunate event handlers.Jan 8 2019, 9:21 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint A series of unfortunate event handlers board.Jan 8 2019, 10:44 PM

• DStrine added a project: Fundraising Sprint Bert and Ernie's Excellent Adventure.Jan 22 2019, 9:13 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Bert and Ernie's Excellent Adventure board.Jan 22 2019, 9:48 PM

• DStrine moved this task from Current Sprint to Sprint +1 on the Fundraising-Backlog board.Feb 5 2019, 9:05 PM

• DStrine moved this task from Sprint +1 to Q3 2021-2022 on the Fundraising-Backlog board.May 8 2019, 9:43 PM

AndyRussG mentioned this in T183978: [Epic] Fundraising kafkatee changes.May 21 2019, 7:34 PM

• DStrine moved this task from Q3 2021-2022 to Current Sprint on the Fundraising-Backlog board.May 28 2019, 8:34 PM

• DStrine added a project: Fundraising Sprint King Kong vs. Mozilla.May 28 2019, 8:36 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint King Kong vs. Mozilla board.May 29 2019, 4:28 PM

AndyRussG moved this task from Review to Backlog on the Fundraising Sprint King Kong vs. Mozilla board.May 30 2019, 5:16 PM

AndyRussG moved this task from Backlog to Doing on the Fundraising Sprint King Kong vs. Mozilla board.Jun 8 2019, 1:13 AM

AndyRussG moved this task from Doing to Review on the Fundraising Sprint King Kong vs. Mozilla board.Jun 10 2019, 12:47 AM

Change 516062 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Move object cache to separate submodule

https://gerrit.wikimedia.org/r/516062

AndyRussG mentioned this in rWMFRbb1cdfebb5bd: Move object cache to separate submodule.Jun 10 2019, 1:58 AM

Review should look at the current "HEAD" of the series of Gerrit patches: https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/FRUEC/+/516062/

Thanks!!!

Also noting here locations of review comments:

This Gerrit change.
Notes etherpad under the heading, "Jack's Feedback".

• DStrine added a project: Fundraising Sprint Land before Timeouts.Jun 11 2019, 8:00 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Land before Timeouts board.Jun 12 2019, 4:38 PM

• DStrine added a project: Fundraising Sprint Men In Slack.Jun 25 2019, 8:43 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Men In Slack board.Jun 26 2019, 3:58 PM

• DStrine added a project: Fundraising Sprint Never Ending Query.Jul 9 2019, 8:36 PM

AndyRussG moved this task from Backlog to Pending Deployment on the Fundraising Sprint Never Ending Query board.Jul 10 2019, 5:28 AM

AndyRussG moved this task from Pending Deployment to Review on the Fundraising Sprint Never Ending Query board.

jgleeson moved this task from Review to Pending Deployment on the Fundraising Sprint Never Ending Query board.Jul 16 2019, 4:26 PM

jgleeson moved this task from Pending Deployment to Review on the Fundraising Sprint Never Ending Query board.

Change 524101 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[wikimedia/fundraising/FRUEC@master] Fix typos in inline docs and comments

https://gerrit.wikimedia.org/r/524101

• DStrine added a project: Fundraising Sprint Office  .Jul 23 2019, 8:30 PM

AndyRussG moved this task from Backlog to Review on the Fundraising Sprint Office   board.Jul 24 2019, 4:51 AM

It looks like some of the tasks that have been +2'ed for code review also need to be +2 verified. I guess this is necessary since we don't have any CI running at all.

Thanks!!!

Change 455869 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Add landingpage event processing

https://gerrit.wikimedia.org/r/455869

Change 456434 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Truncate strings to DB column limit

https://gerrit.wikimedia.org/r/456434

Change 456664 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Refactor stats output

https://gerrit.wikimedia.org/r/456664

Change 456672 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Print a friendly message confirming config

https://gerrit.wikimedia.org/r/456672

Change 457090 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Implement purge-incomplete for landingpage and output stats

https://gerrit.wikimedia.org/r/457090

Change 457091 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Rename package fr_user_event_consumer -> fruec

https://gerrit.wikimedia.org/r/457091

Change 457272 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Add event_type arg to log_file_mapper.get_lastest_time()

https://gerrit.wikimedia.org/r/457272

Change 457691 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Flag SQL template constants as private

https://gerrit.wikimedia.org/r/457691

Change 457275 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Inline doc and comments

https://gerrit.wikimedia.org/r/457275

Change 463110 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Remove testing field (for banner previews) from CNEvents

https://gerrit.wikimedia.org/r/463110

Change 516062 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Move object cache to separate submodule

https://gerrit.wikimedia.org/r/516062

Change 524101 merged by Jgleeson:
[wikimedia/fundraising/FRUEC@master] Fix typos in inline docs and comments

https://gerrit.wikimedia.org/r/524101

All patches in the chain are now merged ready for deployment!

jgleeson moved this task from Review to Pending Deployment on the Fundraising Sprint Office   board.Aug 5 2019, 8:14 PM

• DStrine added a project: Fundraising Sprint Princess Mongodb.Aug 6 2019, 9:08 PM

Maintenance_bot removed a project: Patch-For-Review.Aug 6 2019, 9:10 PM

AndyRussG moved this task from Backlog to Deployed on the Fundraising Sprint Princess Mongodb board.Aug 11 2019, 11:45 PM

AndyRussG moved this task from Deployed to Pending Deployment on the Fundraising Sprint Princess Mongodb board.

• DStrine added a project: Fundraising Sprint Quick and the Deadlocked.Aug 20 2019, 8:44 PM

AndyRussG moved this task from Backlog to Pending Deployment on the Fundraising Sprint Quick and the Deadlocked board.Aug 21 2019, 4:13 AM

I'm wondering, is this really pulling from the files produced by kafkatee, or do we get events directly from Kafka?

In T195594#5428851, @awight wrote:

I'm wondering, is this really pulling from the files produced by kafkatee, or do we get events directly from Kafka?

Ah, I see from the code! It looks great, seems to be parsing the kafkatee logfiles and writing to a nearly identical schema. That would answer the next question I had, which is whether random scripts like the WMDE banner impression export will continue to work under the new system. Thanks for doing this!

In T195594#5429685, @awight wrote:

In T195594#5428851, @awight wrote:

I'm wondering, is this really pulling from the files produced by kafkatee, or do we get events directly from Kafka?

Ah, I see from the code! It looks great, seems to be parsing the kafkatee logfiles and writing to a nearly identical schema.

Thanks! Yeah, for now we decided to keep ingressing from files written by kafkatee. I think that an eventual switch to near-realtime direct consumption of Kafka streams won't be too much extra work... And in that case, we'll still have the log files as backup for backfill (for periods beyond the time that events are retained in Kafka).

That would answer the next question I had, which is whether random scripts like the WMDE banner impression export will continue to work under the new system. Thanks for doing this!

Yeah, that's the plan, in any case! BTW regarding the schema changes and backward compatibility, please see this draft specification. Any comments are most welcome, on the talk page or the related task, T196563. See also the proposed SQL for the schema change, in comments here: T196564.

Thanks again for digging in!!! :)

• DStrine added a project: Fundraising Sprint Rocky Horror Presentation Layer.Sep 3 2019, 8:24 PM

AndyRussG moved this task from Backlog to Pending Deployment on the Fundraising Sprint Rocky Horror Presentation Layer board.Sep 4 2019, 2:27 PM

• DStrine added a project: Fundraising Sprint Sysadmin Kane.Sep 17 2019, 8:13 PM

AndyRussG moved this task from Backlog to Pending Deployment on the Fundraising Sprint Sysadmin Kane board.Sep 17 2019, 8:47 PM

AndyRussG moved this task from Pending Deployment to Deployed on the Fundraising Sprint Sysadmin Kane board.Sep 27 2019, 4:25 PM

Done! Bwahahahahahah :)