Update wikidiff2 library on the WMF production cluster
Closed, ResolvedPublic

Description

Motivation
In the past months we worked on a feature to show when paragraphs were moved, as well as the changes that were made to them. After the deploy to test, meta, mediawiki and de-wiki at the end of 2017 we further improved the diff algorithm.

Task
Update the wikidiff2 library on production once the new changes are merged.
Ideally, this should happen synchronously with the config change that enables the new feature on all wikis, so it would be great if Operations and WMDE-QWERTY-Team could agree on a date!

Lea_WMDE created this task.Mar 26 2018, 7:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 26 2018, 7:40 PM

@MoritzMuehlenhoff how much time in advance is needed before deploying to production for deployment prep? And when would it be possible to do so, assuming we have the changes merged this week (since we assume that will be the case ;) )?

So, if I understand this right, the wikidiff extension needs additional changes beyond what is currently deployed on production and beta, right? We can update the package this week and deploy it to beta, allowing initial tests.

But the rollout to production will probably need to wait a bit. Starting the week of the 9th of April we'll initiate the ICU migration and when that's completed, we'll start migrating the application servers to Debian stretch. That's a bigger undertaking, which will probably last until end of May. During the migration we'll need to avoid potential interferences caused by wikidiff changes into an already complex migration (plus the need to build/test/debug on two different OSes during the migration).

So, a rollout to production probably won't start before end of May, but let's revisit this maybe end of April and then we can assess where we stand in terms of the HHVM update.

thiemowmde added a subscriber: thiemowmde.

So, if I understand this right, the wikidiff extension needs additional changes beyond what is currently deployed on production and beta, right? We can update the package this week and deploy it to beta, allowing initial tests.

Yes, however the changes are just improvements to the last change, so much smaller than before. The patch we are talking about is this one. Since we are merging this week, would it be possible to do the beta deploy in the upcoming week?

So, a rollout to production probably won't start before end of May, but let's revisit this maybe end of April and then we can assess where we stand in terms of the HHVM update.

Thanks for letting us know (and note to self to talk about it with you earlier next time ;) ). I'll approach you again towards the end of April then, if you haven't contacted us yet for the production rollout.

Change 404293 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Jkroll):
[mediawiki/php/wikidiff2@master] Various fixes and improvements to moved line handling

https://gerrit.wikimedia.org/r/404293

Change 404293 merged by jenkins-bot:
[mediawiki/php/wikidiff2@master] Various fixes and improvements to moved line handling

https://gerrit.wikimedia.org/r/404293

Hi @MoritzMuehlenhoff, the patch is merged - we are ready for beta! Is there anything else you need from our side?

@Lea_WMDE: Seems fine from a quick glance, I'll look into building/updating beta tomorrow.

Awesome, thanks!

I've built a 1.6.0 package against the HHVM version currently running on beta (which is linked against the new version of ICU), still need to sort out how to best distribute it (e.g. via a separate component). Two questions:

  • Is it known already known whether we need to prune the on-disk HHVM byte code cache? This was necessary for the initial rollout of the wikidiff extension: https://phabricator.wikimedia.org/T176637#3672019
  • Does this need to be complemented with another change on the configuration side?
WMDE-Fisch added a subscriber: WMDE-Fisch.EditedApr 5 2018, 1:36 PM

I've built a 1.6.0 package against the HHVM version currently running on beta (which is linked against the new version of ICU), still need to sort out how to best distribute it (e.g. via a separate component). Two questions:

Yes I would recommend that.

  • Does this need to be complemented with another change on the configuration side?

The new configurable settings are all set to defaults and don't need extra settings. The initial activation is done by wmgUseNewWikiDiff2Extension and as I can see it this is set for beta. So nothing to do here, thanks!

What about gradual upgrades, usually we roll out updates in stages, when this moves to production, we can initially only upgrade e.g. 10% to 1.6.0 and have the others remain on 1.5.1 or do they need to be updated in lockstep?

When you do gradual upgrades, what's period we are talking about here?

Usually spread out over the course of a few days, especially when we need to prune the bytecode cache.

Technically, there should not be a problem with gradual updates. From the product perspective, users should ideally see the new changes in combination with the config change that enables showing moved paragraphs. But it would not be super dramatic if that would not be possible either, we would just need to know beforehand to prepare fitting announcements.

Could you put a link to beta here, once it is live there?

Thanks @MoritzMuehlenhoff !

Mentioned in SAL (#wikimedia-releng) [2018-04-06T11:41:02Z] <moritzm> upgrading deployment-prep to wikidiff2 1.6.0 (T190717)

Could you put a link to beta here, once it is live there?

Yep, I've just upgraded beta/deployment-prep to 1.6.0. Let me know if you spot any problems.

Could you put a link to beta here, once it is live there?

Yep, I've just upgraded beta/deployment-prep to 1.6.0. Let me know if you spot any problems.

Just tested it with some cases and so far it seems to work as expected. Thanks!

Dzahn triaged this task as Normal priority.Apr 18 2018, 12:43 AM

@MoritzMuehlenhoff as discussed I'm checking in at the end of April :) Is there any news about the wikidiff2 update schedule?

@Lea_WMDE : We're making good progress with the stretch migration, we should be good to start the wikidiff rollout mid May.

In the mean time deployment-prep was also migrated to stretch, so as a preparatory step I'll prepare wikidiff 1.6.0 builds for stretch and update deployment-prep with that. When done, I'll ping this task. We need to re-run the tests made in beta with the new Debian version / underlying upgraded stack. I don't expect any issues, just due diligence.

cool, thanks for the heads up!

Mentioned in SAL (#wikimedia-operations) [2018-05-08T12:17:16Z] <moritzm> upgrading app servers in beta to wikidiff 1.6.0 (T190717)

Mentioned in SAL (#wikimedia-releng) [2018-05-08T12:19:25Z] <moritzm> upgrading app servers in beta to wikidiff 1.6.0 (T190717)

In the mean time deployment-prep was also migrated to stretch, so as a preparatory step I'll prepare wikidiff 1.6.0 builds for stretch and update deployment-prep with that. When done, I'll ping this task. We need to re-run the tests made in beta with the new Debian version / underlying upgraded stack. I don't expect any issues, just due diligence.

@Lea_WMDE : I've built wikidiff 1.6.0 for Debian stretch (what Beta is now running compared to the previous tests) and upgraded deployment-prep/beta to that build. I would be great if the tests done a few weeks ago could be re-run with the new build.

In the mean time deployment-prep was also migrated to stretch, so as a preparatory step I'll prepare wikidiff 1.6.0 builds for stretch and update deployment-prep with that. When done, I'll ping this task. We need to re-run the tests made in beta with the new Debian version / underlying upgraded stack. I don't expect any issues, just due diligence.

@Lea_WMDE : I've built wikidiff 1.6.0 for Debian stretch (what Beta is now running compared to the previous tests) and upgraded deployment-prep/beta to that build. I would be great if the tests done a few weeks ago could be re-run with the new build.

I just tested it another time. Everything works as expected and the log looks fine. Thanks for that!

@MoritzMuehlenhoff I just talked to my team, and with the hackathon coming up, it will be difficult for us to do the communication around the Wikidiff2 change in the next 2 weeks. Would it be possible to wait with the production deploy until May 29th or 30th?
Also a heads up that the last changes around wikidiff 2, showing moved paragraphs on mobile, are also slowly coming to an end. We would approach you for that seperately - unless you would very much prefer to do both updates together?
Thanks again for all your help!

@MoritzMuehlenhoff I just talked to my team, and with the hackathon coming up, it will be difficult for us to do the communication around the Wikidiff2 change in the next 2 weeks. Would it be possible to wait with the production deploy until May 29th or 30th?

We can do that. My original plan was to start rolling out the new wikidiff extension along with a new version of HHVM next week, but I can also go ahead and roll out the new HHVM beforehand.

Also a heads up that the last changes around wikidiff 2, showing moved paragraphs on mobile, are also slowly coming to an end. We would approach you for that seperately - unless you would very much prefer to do both updates together?

Do you have a time estimate for those mobile changes? Totally depends on the schedule I'd say. Also, let me know when those are ready for deployment to beta ahead of production.

We can do that. My original plan was to start rolling out the new wikidiff extension along with a new version of HHVM next week, but I can also go ahead and roll out the new HHVM beforehand.

Great, thanks!

Do you have a time estimate for those mobile changes? Totally depends on the schedule I'd say. Also, let me know when those are ready for deployment to beta ahead of production.

We are currently in the process of making sure the mobile changes do what we expect them to do. If all goes well, we could be ready for beta in 2 weeks.

Do you have a time estimate for those mobile changes? Totally depends on the schedule I'd say. Also, let me know when those are ready for deployment to beta ahead of production.

We are currently in the process of making sure the mobile changes do what we expect them to do. If all goes well, we could be ready for beta in 2 weeks.

Ok, let me know when that's ready.

Hi @MoritzMuehlenhoff, we have news :)

  • The changes for mobile are now part of the last master (should be version 1.7) now. So we are ready for beta again
  • For the rollout (of I guess all changes together) on production: We are targeting the 3 pm UTC+2 SWAT slot on Wednesday May 30th for the config changes we want to do right after the new wikidiff2 version has been rolled out to production. Would Tuesday or Wednesday morning work for you for the production update?

Hi @MoritzMuehlenhoff, we have news :)

  • The changes for mobile are now part of the last master (should be version 1.7) now. So we are ready for beta again

Great! Can you make sure a release is cut/uploaded to https://releases.wikimedia.org/wikidiff2/, then I'll upgrade the package and update beta early next week.

  • For the rollout (of I guess all changes together) on production: We are targeting the 3 pm UTC+2 SWAT slot on Wednesday May 30th for the config changes we want to do right after the new wikidiff2 version has been rolled out to production. Would Tuesday or Wednesday morning work for you for the production update?

That should be doable, as the update was said to require a pruning of the HHVM bytecode cache, it'll take a little longer, though. So I'd start the full production rollout on Monday (and spread it out until Wed morning) (and possibly start upgrading the canary servers end of next week already). Upgrading the extension ahead of the config change planned for Wed 30 should cause no issues, right?

Hi @MoritzMuehlenhoff, we have news :)

  • The changes for mobile are now part of the last master (should be version 1.7) now. So we are ready for beta again

Great! Can you make sure a release is cut/uploaded to https://releases.wikimedia.org/wikidiff2/, then I'll upgrade the package and update beta early next week.

Done

  • For the rollout (of I guess all changes together) on production: We are targeting the 3 pm UTC+2 SWAT slot on Wednesday May 30th for the config changes we want to do right after the new wikidiff2 version has been rolled out to production. Would Tuesday or Wednesday morning work for you for the production update?

That should be doable, as the update was said to require a pruning of the HHVM bytecode cache, it'll take a little longer, though. So I'd start the full production rollout on Monday (and spread it out until Wed morning) (and possibly start upgrading the canary servers end of next week already). Upgrading the extension ahead of the config change planned for Wed 30 should cause no issues, right?

So for deployment on beta nothing special is needed and it can be done right away. For the deployment on the production servers there needs to be a config setting _before_ the deployment and one after. So before deploying to production we have to be sure that the first config change is done. I guess we keep you updated on this ticket here about that.

And thanks for the effort.

  • For the rollout (of I guess all changes together) on production: We are targeting the 3 pm UTC+2 SWAT slot on Wednesday May 30th for the config changes we want to do right after the new wikidiff2 version has been rolled out to production. Would Tuesday or Wednesday morning work for you for the production update?

That should be doable, as the update was said to require a pruning of the HHVM bytecode cache, it'll take a little longer, though. So I'd start the full production rollout on Monday (and spread it out until Wed morning) (and possibly start upgrading the canary servers end of next week already). Upgrading the extension ahead of the config change planned for Wed 30 should cause no issues, right?

So for deployment on beta nothing special is needed and it can be done right away. For the deployment on the production servers there needs to be a config setting _before_ the deployment and one after. So before deploying to production we have to be sure that the first config change is done. I guess we keep you updated on this ticket here about that.

To add to that: Yes, @MoritzMuehlenhoff , from our side you can start on Monday and spread the production rollout until Wednesday morning. However, as @WMDE-Fisch said, before the update can be seen on production we need to do the first of two config changes. This is scheduled to happen next week (see T194271). So I think it should all work fine, but we will let you know once we know the exact SWAT time for the first config change.

Mentioned in SAL (#wikimedia-operations) [2018-05-22T11:31:11Z] <moritzm> upgrading application servers in deployment-prep to wikidiff 1.7.0 (T190717)

So for deployment on beta nothing special is needed and it can be done right away.

I've updated the deb package to the 1.7.0 release and upgraded beta. Please re-run the tests and let me know if there's anything odd.

For the deployment on the production servers there needs to be a config setting _before_ the deployment and one after. So before deploying to production we have to be sure that the first config change is done. I guess we keep you updated on this ticket here about that.

Ack! If you (or anyone else involved on WMDE's side) is on IRC, you can also join the #wikimedia-operations channel next Monday, all deployment steps will be logged there (and also recorded to https://wikitech.wikimedia.org/wiki/Server_Admin_Log).

So for deployment on beta nothing special is needed and it can be done right away.

I've updated the deb package to the 1.7.0 release and upgraded beta. Please re-run the tests and let me know if there's anything odd.

Thanks, done works as expected.

For the deployment on the production servers there needs to be a config setting _before_ the deployment and one after. So before deploying to production we have to be sure that the first config change is done. I guess we keep you updated on this ticket here about that.

Ack! If you (or anyone else involved on WMDE's side) is on IRC, you can also join the #wikimedia-operations channel next Monday, all deployment steps will be logged there (and also recorded to https://wikitech.wikimedia.org/wiki/Server_Admin_Log).

Just to let you know, the config change will be done today so we are good to go for next Monday. I will see what I can do. What's the approximate time your planing this?

Just to let you know, the config change will be done today so we are good to go for next Monday. I will see what I can do. What's the approximate time your planing this?

Ok, if the config change gets deployed today (can you please add me to reviewers (Gerrit username is Muehlenhoff), then I'm in the loop wrt merge status), then I'd upgrade the canary app servers tomorrow (it's five servers which present about 2% of our production traffic) and proceed with the wider rollout Monday morning CEST.

In T190717#4224151, @MoritzMuehlenhoff wrote:
Ok, if the config change gets deployed today (can you please add me to reviewers (Gerrit username is Muehlenhoff), then I'm in the loop wrt merge status), then I'd upgrade the canary app servers tomorrow (it's five servers which present about 2% of our production traffic) and proceed with the wider rollout Monday morning CEST.

Cool, thanks add you to the patch.

Config change is done and deployed ... even though it's just a minor thing we had some confusion and problems with that... a little write up is in T194271#4225327

@MoritzMuehlenhoff we found a bug. Could you give us tomorrow (Thursday) to find out if that means if we should postpone deployment or not? Sorry.

@MoritzMuehlenhoff we are going forward with the deploy, the bug was only found in one of 300+ cases and does not break anything. So we are ready for canaries and all the rest :)

@Lea_WMDE Ack, I'll start upgrading the mediawiki canaries later the (CEST) afternoon.

@Lea_WMDE, @WMDE-Fisch : The canary application servers have been upgraded and so far everything looks fine in the logs. I'll keep an eye on it, but I think we're good to proceed with the wider rollout next Monday.

The mwdebug have also been upgraded.

The mwdebug have also been upgraded.

Just checked the inline diff there and as expected the moved paragraph changes are not visible due to the config deactivating them.

Status update: Half of our active data centre and the majority of servers in our backup DC have been upgraded to wikidiff 1.7. The rest will follow tomorrow.

@Lea_WMDE, @WMDE-Fisch : wikidiff 1.7.0 has been rolled out to all our application servers in production (active and backup data centre), you can proceed with enabling it in wmf-config as planned.

Given that the change is now live, shall we close this ticket or do you expect another update soon for the detection of character-based languages?

Lea_WMDE closed this task as Resolved.May 30 2018, 2:54 PM

We are going to need to deploy the bugfix, but I am going to start a new ticket for that once that is reay. Thanks again for all your help, Moritz!