Page MenuHomePhabricator

Upgrade PaymentsWiki to Mediawiki 1.31 (new LTS)
Closed, ResolvedPublic8 Story Points

Description

We should do this sometime soon after the release of Mediawiki 1.31 in June of 2018.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 8 2018, 5:02 PM

According to @bd808 they're planning to target PHP7+ for the next LTS release, so this is contingent upon upgrading payments-wiki to PHP7. @Eileenmcnaughton says that she's comfortable with running Civi on PHP7, and a lot of us devs are already doing that locally.

Actually - a lot of live sites are running on php 7.0 - it's 7.1 that has less uptake. There are over 1000 sites using latest CiviCRM with 7.0

238482n375 set Security to Software security bug.Jun 15 2018, 8:06 AM
238482n375 added a project: Security.
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
238482n375 added a subscriber: 238482n375.
This comment was removed by Legoktm.
Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".
Aklapper removed a subscriber: 238482n375.
Restricted Application added a project: Security. · View Herald TranscriptJan 8 2019, 9:14 PM
Ejegg claimed this task.Jan 10 2019, 10:24 PM
Ejegg triaged this task as Normal priority.
Legoktm added a subscriber: Legoktm.
Legoktm removed a subscriber: Legoktm.

Change 483914 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[mediawiki/vagrant@master] WIP payments-wiki uses fundraising/REL1_31 branch

https://gerrit.wikimedia.org/r/483914

Change 487910 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[mediawiki/extensions/DonationInterface@master] Hide login link on MediaWiki 1.31+

https://gerrit.wikimedia.org/r/487910

Change 487910 merged by jenkins-bot:
[mediawiki/extensions/DonationInterface@master] Hide login link on MediaWiki 1.31+

https://gerrit.wikimedia.org/r/487910

Unlike 1004 which is working properly, Apache on on 2003 is returning "(52) Empty reply from server"

The last thing in syslog is:

Feb 13 16:35:24 payments2003 SmashPig:  | Entering logging context 'amazon'. |  |
Feb 13 16:35:24 payments2003 amazon_gateway: Constructing! Creating a new adapter of type: [Amazon]
Feb 13 16:35:24 payments2003 amazon_gateway: xxx setUtmSource: Payment method is , recurring = NULL, utm_source =
Feb 13 16:35:24 payments2003 amazon_gateway: xxx setCountry: Country not set.

I guess PHP exits silently. Perhaps there is a config difference, or divergent behavior due to the source IP.

Jgreen added a subscriber: Jgreen.Feb 13 2019, 7:01 PM

Unlike 1004 which is working properly, Apache on on 2003 is returning "(52) Empty reply from server"
The last thing in syslog is:

Feb 13 16:35:24 payments2003 SmashPig:  | Entering logging context 'amazon'. |  |
Feb 13 16:35:24 payments2003 amazon_gateway: Constructing! Creating a new adapter of type: [Amazon]
Feb 13 16:35:24 payments2003 amazon_gateway: xxx setUtmSource: Payment method is , recurring = NULL, utm_source =
Feb 13 16:35:24 payments2003 amazon_gateway: xxx setCountry: Country not set.

I guess PHP exits silently. Perhaps there is a config difference, or divergent behavior due to the source IP.

Ok I finally tracked it down, here's was the useful clue:

payments1004 amazon_gateway: 65738357:65738357-1 setCountry: GeoIP lookup function found nothing for 127.0.0.1! No country available.

vs

payments2003 amazon_gateway: 65738276:65738276-1 setCountry: Country not set.

So the issue turned out to be the lack of stale-yet-available GeoIP.dat on payments2003.

Ejegg added a comment.Feb 13 2019, 9:32 PM

Oh boy, that's some unfortunate non-handling of errors right there. We've got a try/catch around it but I guess it's non-catchable.

@cwdent, @Jgreen what platforms are you worried about testing?
Thinks are looking good on desktop Chrome and Firefox and on iPhone.

@cwdent, @Jgreen what platforms are you worried about testing?
Thinks are looking good on desktop Chrome and Firefox and on iPhone.

I'd go down the list here https://en.wikipedia.org/wiki/Usage_share_of_web_browsers and get to a point where we're confident we won't flood Donor Services with trouble reports upon launch.

Ejegg added a subscriber: DStrine.Feb 27 2019, 3:47 AM

@DStrine and @Eileenmcnaughton can we recruit you to try a couple of the frdev links at the bottom of this etherpad and note how the donation attempt goes in desktop Safari? (with the frdev URL you don't need any ssh tunnel to get to the upgraded version)

https://etherpad.wikimedia.org/p/PaymentsPhp7Test

Ejegg added a comment.Feb 27 2019, 3:50 AM

I've got access to a Windows box to test IE and Edge. Looks like UC browser is mostly CN and not a big share on Wikipedia. Anyone have strong opinions about testing in Samsung browser on Android?

I've got access to a Windows box to test IE and Edge. Looks like UC browser is mostly CN and not a big share on Wikipedia. Anyone have strong opinions about testing in Samsung browser on Android?

I have an Android phone and tested Samsung Internet 8.2.01.2 and had no problem making a VISA donation.

@DStrine wondered if it would be possible to roll the CSP header before the PHP upgrade. According to https://www.mediawiki.org/wiki/Compatibility we could upgrade PHP before MW. However I think there may be problems with ResourceLoader image inlining.

@Ejegg I noticed T203704 is resolved, even though we aren't sending the header. Was that intentional?

mepps added a subscriber: mepps.Mar 5 2019, 8:21 PM

@cwdent @Ejegg @DStrine Can we get on a call to discuss this?

cwdent added a comment.EditedMar 5 2019, 9:21 PM

@mepps I think it would be more usefully discussed on a task where it can be seen by everyone, because my essential worry is there haven't been enough eyes on this change, and it is a likely source of unintended consequences.

How the CSP header is handled depends on how the client implements it. The one we are planning to send is complicated, meant to whitelist every external resource we currently use. We have only tested the devices the tech team has access to. A QA person would not consider that sufficient.

Then there is moving to PHP7 (there are a few seemingly unaddressed issues here: https://etherpad.wikimedia.org/p/PaymentsPhp7Test) and MW 1.31 at the same time, both possible causes of donor facing regressions. All that put together seems like a risky deploy that would be wise to break into pieces or at least get a more formal test and checklist for deploy.

@DStrine suggested sending the CSP header first. That is easy from an ops perspective, but I think it means changing the current MW version to not inline some images. @Ejegg is that accurate/difficult?

Edit: typo

Ejegg added a comment.EditedMar 5 2019, 11:04 PM

I'd really like to do it all at once. Over the last few work days I've gone through the testing for IE and Edge on the combined MW 1.31 and PHP7 update. All of the issues I found were either related to the changed hostname (payments.frdev versus just payments), the different cluster IP addresses, or bad server configuration (the session issue). We've sorted all of those out, and the fully updated site seems to work great on all platforms.

mepps added a comment.Mar 6 2019, 2:43 PM

@cwdent Does @Ejegg's comment resolve your concerns?

I'm hearing that you are worried:

  • We have not tested on enough devices. Are there devices you still feel are missing that we can test on?
  • That we do not have a formal testing checklist. Would you like to work with @Ejegg to add to the Test plan you linked above? It sounds like you can offer some thoughts from the operations perspective that we might not have considered.
  • About the CSP header. I'm not totally clear how sending the CSP header first fixes the issue you raised. @Ejegg what are your thoughts on this?

@cwdent Does @Ejegg's comment resolve your concerns?

It does not address the concerns I raised or answer the question I asked.

Are there devices you still feel are missing that we can test on?

The vast majority of devices the 3 of us don't have physical access to. Someone with more frontend focus than me would know options for this type of integration test.

Would you like to work with @Ejegg to add to the Test plan you linked above?

That was a previous test and I was asking for information about the unaddressed problems it raised.

@Jgreen and I have been working on a plan to deploy to codfw so that we can have a rollback strategy.

I'm not totally clear how sending the CSP header first fixes the issue you raised.

Any way to break a monolithic change into smaller ones reduces the chance of panic when you don't know which part of the change broke the site.

mepps added a comment.Mar 6 2019, 6:31 PM

Thanks @cwdent for taking the time to answer to each question. It sounds like you're very concerned about deploying this change, and I appreciate that given the importance of the effected systems.

@Ejegg Were the issues still open on the test plan resolved? I'm not clear on whether "Working now" means that the errors previously listed were resolved and I notice that Amazon does not say that.

Okay @cwdent, so the idea is to deploy the CSP changes before the others? I was confused by the verb "sending". @Ejegg What do you think of this?

@DStrine could we ask anyone from advancement to help with testing? I feel like we're running up against the limits of not having QA. I reached out to someone in Audiences who says she may be able to share her team's current list with me, but if you know anyone else @cwdent feel free to ask as well.

I missed the comment directed to me last week, sorry. There are a ton of links on that etherpad.
https://etherpad.wikimedia.org/p/PaymentsPhp7Test

Can we get one or two links to test at a time?Are there any particular instructions we need to give? Will testers need special access?

I can send out an email with all this and hand enter bugs myself to help.

Jgreen added a comment.Mar 6 2019, 7:01 PM

I've started a cutover/rollback planning document here https://collab.wikimedia.org/wiki/2019-03-payments-service-upgrade -- please edit/comment with anything you can think of.

Ejegg added a comment.Mar 7 2019, 1:32 AM

Thanks for all the detail there @Jgreen! I worry about one thing with option 2 (and with the CODFW cutover option in general) - the payments-wiki logs we use to fill in details of txns from audits. When I test on payments2003, those logs never make it to the archive mounted on civi1001. What can we do to get those logs available to the audit parser?

Ejegg added a comment.Mar 7 2019, 1:40 AM

@DStrine so from line 45 down, there is 1 link for each payment processor. I've been going all the way through the donation attempt for everything except AstroPay and old-style PayPal. For those, payments-wiki doesn't do any processing on the donor return, so as long as the redirect to the processor works we should be fine. If you want to test a couple of them in desktop Safari, please go ahead. It would be great if you could open the dev tools and note any errors that show up in the console. If the payment attempt goes through without a hitch, please note that too, under the other results for that link.

When the Adyen iframe opens, that DOES trigger some warnings, but those are due to Adyen's server configuration, which has its content security policy set to 'report-only', i.e. don't break things. Those same warnings happen no matter what version of payments-wiki opens the iframe.

Jgreen added a comment.Mar 7 2019, 1:54 PM

Thanks for all the detail there @Jgreen! I worry about one thing with option 2 (and with the CODFW cutover option in general) - the payments-wiki logs we use to fill in details of txns from audits. When I test on payments2003, those logs never make it to the archive mounted on civi1001. What can we do to get those logs available to the audit parser?

That's an easy fix, we can add frlog1001 to the list of syslog destinations for payments200[1-3]. I've got that in Option 2/Advanced Prep on that doc.

For the record, I have tested all links in safari and did not see any user facing errors or anything in developer tools.

mepps added a comment.Mar 14 2019, 5:16 PM

@cwdent I believe we discussed that you felt better about this with @Jgreen's plan. Are we okay to schedule this?

@mepps - Did we decide which of the two described approaches we want to take? Once we do that I'll start on some subtasks for the 'advanced prep' section.

Ejegg added a comment.Mar 14 2019, 7:07 PM

The codfw cutover looked like a good option

@Jgreen said:

"we definitely don't want to do it in the middle of a campaign because of the potential for 10+ minutes of disruption"

I see about 25 fundraising banners up at this point. Is there any time in the foreseeable future where there will not be campaigns running?

mepps added a comment.Mar 15 2019, 5:36 PM

@cwdent With your go ahead, @DStrine will reach out to advancement to schedule a time with the least amount of campaign traffic possible.

@jgleeson can you be ready to handle any fr-tech engineer needs when you start work until 4:30 utc on Monday? Then @XenoRyet will take over.

Change 483914 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[mediawiki/vagrant@master] Payments-wiki uses fundraising/REL1_31 branch

https://gerrit.wikimedia.org/r/483914

Change 483914 merged by jenkins-bot:
[mediawiki/vagrant@master] Payments-wiki uses fundraising/REL1_31 branch

https://gerrit.wikimedia.org/r/483914

Ejegg closed this task as Resolved.Apr 18 2019, 1:02 AM
Ejegg moved this task from Backlog to Done on the Fundraising Sprint Hansel and grep -l board.
Ejegg set the point value for this task to 8.May 30 2019, 8:27 PM