Page MenuHomePhabricator

Reconfigure fundraising check_endpoints
Closed, ResolvedPublic

Description

Also maybe to PayPal EC, DLocal, Amazon and Adyen endpoints.

Let's give these slightly less cryptic names than check_gcsip.

Event Timeline

Does this mean fr-tech wants to get paged by icinga?

There are a couple related tickets here: T202419

I just added the icinga pattern to the things that make a notification on my phone, at least!

While currently tech can see the alerts in IRC, Jeff and I get SMS about it, which means dropping what you are doing, getting out of bed, etc. So it would be good to keep that alert stream to things that are actionable by ops. Another icinga channel might be the answer, but there is the open question of who pays for our phones.

We already get a lot of middle-of-the-night alerts about hiccups in endpoint connectivity, regardless of whether banners are up or whether we're currently using a particular endpoint. Can we explore other ways to do this? Random ideas:

  • hook into the payments-wiki config to limit 'critical' alerts to endpoints we're actively using
  • have payments-wiki trigger an alert based on feedback from the client re. success in loading the iframe
  • don't notify unless there is payments traffic happening

We already get a lot of middle-of-the-night alerts about hiccups in endpoint connectivity, regardless of whether banners are up or whether we're currently using a particular endpoint. Can we explore other ways to do this? Random ideas:

  • hook into the payments-wiki config to limit 'critical' alerts to endpoints we're actively using
  • have payments-wiki trigger an alert based on feedback from the client re. success in loading the iframe
  • don't notify unless there is payments traffic happening

Ignore this ^^^, I wrote check_endpoint which will allow us to roll up all the endpoints into a single nagios check, and consolidate across hosts to mitigate the pagerstorm we get when there's a single endpoint hiccup. It doesn't fix the problem of monitoring endpoints we aren't actively using, but it's an improvement.

Adyen

  • only needs connectivity from the IPN listener and from civi1001 - on the payments-wiki side we just make the donor's browser POST to a URL on live.adyen.com with some signed parameters.
  • Civi and maybe the listener need access to https://pal-live.adyen.com/pal/Payment.wsdl

Amazon

  • Payments, civi, and the listener need access to mws.amazonservices.com, which should be tcp-proxied via 34.233.223.241

AstroPay / D*Local

Ingenico Connect

PayPal

As far as knowing what endpoints are active, the easiest from the payments-server side would be to track access to the various *Gateway URLs:
e.g.

  • Special:IngenicoGateway (ingenico connect)
  • Special:PayPalExpressGateway (paypal express checkout - the old integration didn't require any API calls, just a redirect)

Change 540472 had a related patch set uploaded (by Jgreen; owner: Jgreen):
[operations/puppet@production] Adjust nsca_frack.cfg.erb for new approach to monitoring endpoints.

https://gerrit.wikimedia.org/r/540472

Change 540472 merged by Jgreen:
[operations/puppet@production] Adjust nsca_frack.cfg.erb for new approach to monitoring endpoints.

https://gerrit.wikimedia.org/r/540472

Jgreen renamed this task from Create icinga alert for connectivity to Ingenico Connect endpoint to Reconfigure fundraising check_endpoints.Oct 10 2019, 3:30 PM

@Ejegg this is mostly done, with some notes:

I'm not sure what's up with the paypal endpoints. Neither of them connect with a browser, curl, wget, etc. The endpoints are clearly listening on tcp/443 but there's something odd about the actual handshake. Is a client cert required, or are they somehow non-standard https?

https://world.api-ingenico.com/ responds with a 404. That's workable with check_endpoints, but I'm wondering is there a path we should for a more meaningful http result?

Are the old GlobalCollect endpoints https://ps.gcsip.com/wdl/wdl and https://api.globalcollect.com/ still in use, or should I remove them from endpoint checks?

@Jgreen: oh, I think those paypal endpoints require a client cert. It should be a .pem file stored in /etc

The old GC endpoints are still used for recurring charge jobs, and I think for iDEAL transactions, so yeah, please leave them up.

To get a more meaningful result code from the new API, we'd have to do a tiny bit of work to implement the testconnection call:
https://epayments-api.developer-ingenico.com/s2sapi/v1/en_US/java/services/testconnection.html?paymentPlatform=ALL#services-testconnection

It requires authentication, but it should be easy to use our existing SmashPig auth code to calculate the right headers for that, and to make a simple SmashPig wrapper script that just makes the (authenticated) call and dumps out the response.

Change 543176 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/SmashPig@master] Implement Ingenico connection test

https://gerrit.wikimedia.org/r/543176

Change 543176 merged by jenkins-bot:
[wikimedia/fundraising/SmashPig@master] Implement Ingenico connection test

https://gerrit.wikimedia.org/r/543176

Jgreen triaged this task as Medium priority.Oct 29 2019, 5:55 PM

Change 547011 had a related patch set uploaded (by Jgreen; owner: Jgreen):
[operations/puppet@production] add check-endpoints to payments role in nsca_frack.cfg.erb

https://gerrit.wikimedia.org/r/547011

Change 547011 merged by Jgreen:
[operations/puppet@production] add check-endpoints to payments role in nsca_frack.cfg.erb

https://gerrit.wikimedia.org/r/547011

Got api.paypal.com working with client certs on payments and civi. Leaving the ingenico check as a pass with 404, since it at least it shows us the endpoint is online.