Page MenuHomePhabricator

Automation / optimization of data cubes
Closed, ResolvedPublic

Description

Logging my request for FR Tech support to learn how to optimize data cube refresh and automation processes. Once the table has been created (as in the case of France and en6c cubes), there are two types of updates I would like help implementing:

  1. Keeping the table up to date as donations come in (even if not real-time; how to update daily, etc).
  2. Rully refreshing lifetime statistics for the table as a whole, less frequently.

I've spoken to many of you about this so wanted to put it down in the queue. Thanks for ongoing help!

Event Timeline

I have a small test cube on dev_analytics pulling down new donation IDs, contact IDs, and utm_medium on a 10 minute interval via a python3 script in my home directory. I will alter the time table to a 1 hour interval to run overnight (through Saturday, possibly the weekend) for additional timing tests, and pick this up again Monday for more full-scale trial of larger data sets.

Removed test cube from cron and set up a v1 production cube on a 10 minute schedule for inserts only. Next, I'll be working on updates on the whole cube, which would be more resource-intensive.

@Jgreen can we merge this task with the ongoing umbrella task for infrastructure setup? https://phabricator.wikimedia.org/T238395

I can keep updates on cube schedules here or wherever else will be helpful.

@Jgreen we can close this as resolved. At least one data cube is running both inserts and updates on a schedule in the new ecosystem now.

Jgreen claimed this task.
Jgreen triaged this task as Medium priority.
Jgreen moved this task from Watching to Done on the fundraising-tech-ops board.

@Jgreen we can close this as resolved. At least one data cube is running both inserts and updates on a schedule in the new ecosystem now.

Great, done!