Page MenuHomePhabricator

Improve donation scrubbing query
Open, Needs TriagePublic

Description

NOTE: this ticket needs some rework and simplification

Product perspective

Context

The scrubbing code needs to check several states of the donation to decide which donations can be scrubbed.

Current state

The scrubbing is too aggressive and simplistic.

Acceptance criteria

  • Scrubbing code selects only the right kinds of donations:
    • exported
    • deleted
    • abandoned (unconfirmed external payment older than 2 days)
  • The purge_personal_data command is running again.

External links

See discussion document https://docs.google.com/document/d/1dQPskLvSPb3c8-wrmh-Ufd11_fC6VR81J2I-oWgBSHM/edit?usp=sharing


Technical perspective

Context

This ticket is part of an effort to get rid of reliance on the status field, see T359954: Remove status from donation queries

Constraints

  • This ticket does not change how we store the "Deleted" status of donations - we still rely on the current behavior of the fundraising-donations bounded context that preserves this status correctly in the deprecated status field as "D". We track the change to how the soft-delete works in T359955: Create a deletion flag on the donation database entries. That ticket is out of the scope of this ticket.
  • The scrubbing query should ignore the moderation state.
    • The reason for this is how the campaigns team moderates the donations:
      • "Amount too high" donations will be un-moderated, at which point they will be exported and are eligible for scrubbing
      • any "fake" donations will be deleted, but not un-moderated (that would take 2 clicks and the moderation state is a good indicator as to why a donation was deleted).
    • Our query would become very complex if we had to take moderation status into account: We'd have to ignore it for deleted donations.
    • The only scenario where we're left with un-scrubbed donations is when donations are not touched (un-moderated or deleted) by a moderator. They will linger, because they will never be exported.
  • The payment bounded context does not store payment creation dates. This means that the concept of "abandoned" payments is only relevant for scrubbing, combining a small list of "pending" payments with the donation dates in a certain range.

The following query gives a good insight in how the campaigns team currently handles moderation:

SELECT COUNT(*), month(dt_new), year(dt_new), status 
FROM spenden d 
JOIN donations_moderation_reasons m ON d.id=m.donation_id 
GROUP BY YEAR(dt_new), MONTH(dt_new), d.status;

Affected repositories

  • fundraising-payments, new minor release
  • fundraising-donations
  • fundraising-backend (dependency update. If you improve the naming in fundraising-donations, also a name update)

Deployment notes

After deployment, we have to monitor the number of un-scrubbed donations. If the scrubbing code leaves old donations untouched, we need to adapt the query, otherwise the number of returned abandoned payments will become larger and larger over time (see the query in the donation bounded context to get the parameters for the payment bounded context), leading to memory and performance problems.

Implementation details

The query for the donation bounded context needs to query

fundraising-payments

You need to add a new interface (e.g. PendingPaymentList) to the Payment bounded context Domain namespace that returns the IDs of "Pending" external payments (external payments without a valuation date).

To avoid returning a very huge array, the method must have a minimum payment ID parameter, e.g. PendingPaymentList::getPendingPaymentsWithIdGreaterThan( int $id) or PendingPaymentList::getPendingPaymentIds( PendingPaymentCriteria $criteria )

Implement the interface in the DataAccess namespace. Possible SQL query for the use case: SELECT p.id FROM payment p ON d.payment_id=p.id AND p.payment_method IN ('PPL', 'MCP') JOIN payment_paypal USING(id) JOIN payment_credit_card USING(id) WHERE p.id > :min_id

fundraising-donations

Update the payment dependency.
Change DatabaseDonationAnonymizer to query the un-scrubbed donations that match one of the following criteria (using $qb->andWhere( $qb->orX( /* criteria go here */ ) )):

  • where payment id matches abandoned payments.
    • "Abandoned" payments have two criteria: the payment must be pending AND the donation date must be lower than the cutoff date.
    • This is the only place where you use the $donationGracePeriod in the query!
    • Inject the new interface into DatabaseDonationAnonymizer.
    • To get the minimum payment ID for the interface, you can do a DBAL query with SELECT MIN(payment_id) FROM spenden WHERE is_scrubbed=0, then pass the result array from the interface to the main query builder for donations.
      • If you've introduced the concept of abandoned payments in the payment bounded context, use the query SELECT MIN(dat_new) FROM spenden WHERE is_scrubbed=0 instead.
  • where status is "D" (deleted donations)
  • Where dt_gruen IS NOT NULL (scrub exported donations)

Create a test with test data to check that DatabaseDonationAnonymizer scrubs the right donations. Suggested donation data for the test (all inserted donations must be un-scrubbed):

  • Insert and test for scrubbed state:
    • Abandoned Paypal donation (dt_new before grace period)
    • Abandoned Credit Card donation (dt_new before grace period)
    • Deleted donation
    • Exported donation (one for each payment type, make sure the external ones are booked)
  • Insert and test for un-unscrubbed
    • Pending Paypal donation (dt_new inside grace period)
    • Pending Credit Card donation (dt_new inside grace period)
    • Un-exported donation (one for each payment type, make sure the external ones are booked)
Improving naming

You may rename DatabaseDonationAnonymizer to DatabaseDonationScrubber and DonationAnonymizer to DonationScrubber to avoid the confusion between "anonymous" donations (where donor has not given any info) and "scrubbed" donations (where we removed the donor info, but preserve the original address type).
This renaming would be a backwards-breaking change that needs changes in fundraising-backend.
Side note: Anonymous donations become scrubbed, even if that does not change any data in spenden except the is_scrubbed flag.

Event Timeline