Page MenuHomePhabricator

Orphan rectifier is silent about communication failures
Closed, ResolvedPublic

Description

Is this the correct behavior? One or two maybe, but at some threshold we should notify with failmail.

Our decision to mail on any stderr might already be the right balance, let's review.

Event Timeline

21:31:46  X Rectifying orphan: X
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X Clock at Confirm_CreditCard: 0 (X.1941)
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X Failed Validation. Aborting GET_ORDERSTATUS Array
21:31:46  (
21:31:46      [issuer_id] =} donate_interface-error-msg
21:31:46  )
21:31:46  
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X CVV Result: , AVS Result: 
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:            [31;40m[1m[error][0m
21:31:46  X Can't communicate or internal error: Failed data
21:31:46  validation
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X Result message: Can't communicate or internal
21:31:46  error: Failed data validation
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X : UNKNOWN INCOMPLETE: Can't communicate
21:31:46  or internal error: Failed data validation
21:31:46  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]
21:31:46  X Elapsed Time: 53
21:31:48  WD DonationInterface: orphans:globalcollect_gateway_trxn:                 [info]

I just noticed that the error is likely related to my recent upheaval of DI code: Failed data validation

There was a slow trickle of these failures, usually zero per job, but every so often there would be 2-4 log lines about it, until this job:

Ingenico_Orphan_Rectifier-2017-03-21_18-30-53

At which point we saw hundreds of errors per job.

I'll check which deployment this corresponds to.

Some of the issues are because we're trying to rectify iDEAL donors. We shouldn't do that!

Change 344542 had a related patch set uploaded (by Awight):
[mediawiki/extensions/DonationInterface@deployment] Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344542

Change 344545 had a related patch set uploaded (by Awight):
[mediawiki/extensions/DonationInterface@master] Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344545

Change 344552 had a related patch set uploaded (by Awight):
[wikimedia/fundraising/crm/vendor@master] [HOTFIX] Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344552

Change 344552 merged by Ejegg:
[wikimedia/fundraising/crm/vendor@master] [HOTFIX] Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344552

Change 344545 merged by jenkins-bot:
[mediawiki/extensions/DonationInterface@master] Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344545

ggellerman moved this task from Triage to Q3 2021-2022 on the Fundraising-Backlog board.

Change 344542 abandoned by Awight:
Patch orphan rectifier to drop non-cc records

https://gerrit.wikimedia.org/r/344542