Page MenuHomePhabricator

investigate moving non-essential databases (faulkner, pgehres, fredge) off of fundraising database cluster
Closed, ResolvedPublic

Description

We should look at moving the analytics databases off of this cluster. At present this would free up nearly 700GB of the 1.6GB total.

354G faulkner
293G pgehres
39G fredge
686G total

We have frdb1003 already for fundraising analytics. Can we move these databases there?

Another question is how far are we from being able to phase out faulkner, pgehres, and fredge databases entirely, and move their functionality to grafana and Superset?

Investigation notes:

faulkner - This should be an easy one, it is a static legacy pre-2013 analytics database. It's probably no longer used all, but we need to confirm. [Decision: remove from fundraisingdb cluster, leave on frdb-analytics for now. T256670]

pgehres - This is an active analytics database that is populated from the kafkatee banner/landing page pipelines. Data is imported on 15 minute intervals by process-control jobs on civi1001. We could simply move the database, but we should also consider moving the jobs that populate it to fran1001 to take load off of civi1001. DjangoBannerStats uses separate database handles to make connections to pgehres, civicrm, and drupal databases, configured in /etc/django_settings.py. [Decision: migrate to fr-analytics cluster. T258526]

fredge - As noted in comments below, this is queried against the civicrm database and may not be something we can move. [Decision: can't move at this time]

Event Timeline

When a plan forms around this we need to inform fr-online. Also we probably don't want to break anyone's workflows between September and January

Moving fredge would make it pretty hard to do some of the antifraud reporting. There is at least one chart in the dash and a report in Civi that join fredge tables to tables from civi and drupal databases.

faulkner database is removed everywhere except frdb1003 (fr analytics db) and the archived backup

From today's analytics coordination meeting: The pgehres database is queried directly from frdev during campaigns to monitor banner activity. It's also accessed for various reporting functions using Peter Coombe's statler software, https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/analytics/+/585235/
It sounds feasible to move the database and database generation script to frdb1003/fran1001. We will need to add users frdev1001 mariadb users to frdb1003 first.

@Jgreen I'd like to set up more alerting on frdb1003 for before we make this change so that my data cube testing and runs don't cause the server to lag behind master. Any ideas you have on this - or ideas on how I can leverage existing alerting - would be great!

This comment was removed by Jgreen.

@Jgreen I'd like to set up more alerting on frdb1003 for before we make this change so that my data cube testing and runs don't cause the server to lag behind master. Any ideas you have on this - or ideas on how I can leverage existing alerting - would be great!

We have nagios/icinga monitoring of mysql replication lag that polls every 5 minutes, but the warning/critical thresholds are very lax during prototyping and testing. We can set those back to more normal ranges anytime.

Jgreen claimed this task.
Jgreen triaged this task as Medium priority.
Jgreen updated the task description. (Show Details)
Jgreen moved this task from In Progress to Done on the fundraising-tech-ops board.

Investigation is complete.