Ok, another run:
- dev_civicrm fresh restore from prod 2020-08-07 (after table truncating and drops on this task)
- across-the-board keep <= 550 days
- run time was 2:20
- footprint reduction of 190GB (from 723GB to 533GB)
Ok, another run:
@Jgreen that is about 200GB less than is was at the start I think (+ 50GB off the drupal DB - definitely something)
@Jgreen I think we should drop the triggers & tables too from the top of this phab today too - it's pretty cautious not dropping them without an outage IMHO since the tables are completely unused without the triggers & both should be super quick to drop - so it would be good to use the outage to do it
@Jgreen was I wrong in thinking the civi upgrade is today?
@Jgreen those tables that we plan to drop all together were included in the 200GB? Will we drop those today on live during the outage?
Testing on frdev1001 dev_civicrm database, purging to 750 days for all log tables reduced the database footprint from 862GB to 652GB. Found four tables missing the log_id auto_increment column, the script passed over those by design.
The more I think about this the more I'd like to identify the tables that tell us about the contact & leave them out. I'm comfortable being more aggressive on all the other tables but I think the most likely thing for us to want to know after 6 months is address history & history of opt out etc.
I think contribution & activity data is large & I think they have less longer term value
Testing on frdev1003/dev_civicrm, I wrote a script that creates log_*_postpurge tables for each log_ table, and populates them with the past 750 days of data, figuring 550 days discussed above plus ~200 days since dev_civicrm was reloaded. This takes about 3:40 to run and reduces the log table data footprint from 551GB to 151GB.
Next step is to migrate users to frdb1003/pgehres database.
Investigation is complete.
I think some tables have more value for old data than others - notably contact & email data
Icinga alarm added to check_fundraising_job where other recurring alerts are triggered. I have found a way to have it notify once per day.
The check is running and correctly pulling the contribution numbers. Alerting will be turned on after @Jgreen review.Jul 20 23:07:11 frdb1002 nagios_nsca: frdb1002 check_recurring_contrib_processing 2 CRITICAL recurring_contrib_processing=783 [critical >=500]
My understanding was that we'd discussed a year for the log tables but the only question was whether it was a 1 year ago or 1 year as specified at a specific time of the year. I might be wrong though.
What's the definition of done on this task? I see a merged patch, a related task, and a lot of comments. I'm curious what the next steps are.
Moving from FR-Ops back into Triage because afaict this is not blocked on fr-tech-ops, we'll need design decisions about how we want queues and payment-providers to work to proceed.
@Jgreen I'd like to set up more alerting on frdb1003 for before we make this change so that my data cube testing and runs don't cause the server to lag behind master. Any ideas you have on this - or ideas on how I can leverage existing alerting - would be great!
From today's analytics coordination meeting: The pghehres database is queried directly from frdev during campaigns to monitor banner activity. It's also accessed for various reporting functions using Peter Coombe's statler software, https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/analytics/+/585235/
It sounds feasible to move the database and database generation script to frdb1003/fran1001. We will need to add users frdev1001 mariadb users to frdb1003 first.
We ended up testing on the otherwise unused codfw banner logger server.
Removed tag because there's nothing actionable for fundraising-tech-ops, donate-wiki is not administered by our team.
Removed tag because there isn't anything actionable from a fundraising-tech-ops perspective.
Removing fundraising-tech-ops tag because the fr-tech-ops actionable part was broken out to a separate task T247846 and completed.
Checking back on this it looks like https://links.e.uso.org has slipped back to a B rating because they haven't ceased TLS 1.0/1.1 support. I'm not sure if that's still a valid hostname to check, or if this task is still relevant however.
Closing task because we did about as much ruleset tuning as was feasible given the application. Spun off T122322 to look at the possibility of modifying payments-wiki behavior to make it easier to work with a WAF.
Removed Fundraising-Backlog tag. The upgrade (this task) is a fr-tech-ops project. Broke out the Fundraising Tech parts to separate tasks for clearer task dependency tracking.
@Jgreen I can take a stab at those answers
for log_civicrm_entity_tag, yep, that's the idea. We used to automatically apply a couple of tags to contacts and contributions during queue consumption, but we no longer do. So 99%+ of the existing log table is that automatic useless stuff. We can drop all the old stuff and just log new changes, which will presumable be human-initiated.
log_civicrm_subscription_history should definitely be dropped along with associated triggers - the subscription_history table itself is basically a log of what groups people have been assigned to, so a log table on those values is redundant.
@Eileenmcnaughton re. the list of tables to drop, can you clarify these two?