Page MenuHomePhabricator

RFC: Drop support for older database upgrades
Closed, ResolvedPublic

Assigned To
Authored By
Ladsgroup
Aug 6 2020, 1:48 AM
Referenced Files
None
Tokens
"The World Burns" token, awarded by Bugreporter2."Heartbreak" token, awarded by Nemo_bis."Like" token, awarded by Ammarpad."Like" token, awarded by Mainframe98."Like" token, awarded by Prod."Like" token, awarded by Michael."Like" token, awarded by Addshore."Like" token, awarded by Akuckartz.

Description

  • Affected components: Mediawiki core
  • Engineer(s) or team for initial implementation: @Ladsgroup (In volunteer capacity)
  • Code steward: Platform team

Motivation

Currently, MysqlUpdater class on master has (at least on paper) support for upgrading from 1.2 (Released on 2004-05-24, which predates birth of some of our volunteer devs) to 1.35 (hasn't released yet) and it seems there is no plan to stop it and it will continue forever.

As the result:

  • Number of database checks in MySQL is reaching 500 and size of the class to 1,500 lines of code.
  • The archive of sql patches (directory of maintenance/archives/) is really large and unorganized
  • In every update.php run, the system has to run all of these checks (updatelog stores that some of them are done but it's not sql checks and it's only maintenance script runs as far as I see in my localhost)
  • Due to lots of reasons, update.php can never be ran in production but it manages to sneak in and get ran and cause outages: T157651: sql.php must not run LoadExtensionSchemaUpdates. This means reducing the probability of causing issues in case this kind of problem happen again would be nice.
  • Keeping ability to upgrade from any point in time makes schema change logic quite complex:
    • You can't reintroduce an index (with different columns for example) and it has to have a different name, otherwise the update.php in every run (on master) removes the index and re-add it again. Same goes with changing field data type twice, if you change it from blob to varchar and a couple years later to varbinary, the system on master sees that the field type is not varchar, thinks it's blob, change it to varchar, and then the next check sees that it's not varbinary, and turns it to varbinary and it happens in every update.php run. You might argue that we can remove the first schema change but what if one of the updates in between depends on this certain data type?
    • If you remove a table, you need to remove all updates related to it, otherwise the update logic will break (like T230317: Error: 1146 Table 'valid_tag' doesn't exist when upgrading from an ancient MediaWiki version)
    • This complex logic was one of reasons behind one of our biggest outages when a really important table got dropped because Wikibase assumed due to lack of wb_terms table which was meant to be dropped from the code soon, the system is upgrading from a version that's 8 years old (predating wb_terms) so dropped several tables to rebuild them. i.e. Wikibase mistook the future with the far past: T249565: Wikidata's wb_items_per_site table has suddenly disappeared, creating DBQueryErrors on page views
  • This logic never worked properly from old versions anyway and it's famous in third party users that for big jumps and large databases, it's unreliable. They usually upgrade from one LTS to another multiple times instead and given that we have VCS, it makes sense.
  • Due this complex logic, properly writing test for it is hard, there are some snapshots (in sqlite) that build the system, run upgrade on it and check if it matches with the current system but it's pretty limited and doesn't cover upgrade from all releases.
  • The complex logic is not documented and stored as institutional knowledge with low bus factor which lots of devs have to explain and repeat for every new person doing a schema change for the first time (here's an example)
  • Technically upgrading from 1.2 is impossible because the MySQL version that mediawiki 1.2 needs is so different from 1.35 that MySQL upgrade (with lots of data) would be non-trivial

Requirements
  • No current functionality for upgrading from an LTS to another should break
  • The developer productivity and onboarding cost of doing schema changes in core and extensions should improve.
  • The *Updater classes should stop growing non-stop

Exploration

Proposal:
On master, only support upgrading from the last LTS release that's not EOL'd yet (basically meaning two LTS releases). Remove all of old updates and their .sql files and make it clear in RELEASE-NOTES (add a dedicated section that upgrades from which releases are supported).

The only downsides with proposal:

  • It would be harder for people to upgrade from really really old versions and need to do it in jumps but update.php is unreliable in that regard anyway.
  • The current archives of .sql files is actually a good library to find the most similar alter table to copy-paste, specially for DBMS engines that are different from the ones the dev is familiar with (like Postgres or Oracle to me). But this will be fully addressed with abstract schema changes (T191231: RFC: Abstract schemas and schema changes) which will come in the next couple of months.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So just for concrete example's sake (please rebut each as appropriate):

Upgrading to 1.35 (LTS)

  1. There would not be support for 1.23 (-3 LTS) to 1.35 (LTS)
  2. There would not be support for 1.24, 1.25, or 1.26 to 1.35 (LTS)
  3. There would be support for 1.27 (-2 LTS) to 1.35 (LTS)
  4. There would be support for 1.28, 1.29, 1.30 to 1.35 (LTS)
  5. There would be support for 1.31 (-1 LTS) to 1.35 (LTS)
  6. There would be support for 1.32, 1.33, or 1.34 to 1.35 (LTS)

Upgrading to 1.36 (LTS +1)

  1. There would not be support for 1.23 (-3 LTS) to 1.36 (LTS +1)
  2. There would not be support for 1.24, 1.25, or 1.26 to 1.36 (LTS +1)
  3. There would not be support for 1.27 (-2 LTS) to 1.36 (LTS +1)
  4. There would not be support for 1.28, 1.29, 1.30 to 1.36 (LTS +1)
  5. There would be support for 1.31 (-1 LTS) to 1.36 (LTS +1)
  6. There would be support for 1.32, 1.33, or 1.34 to 1.36 (LTS +1)

I think you got this wrong. The way the proposal is worded, support for upgrades would continue for old LTS releases until a new LTS release bumps the oldest supported LTS release off the list.
In other words, I believe the release of 1.36 (non-LTS) doesn't affect the migration support for 1.27+

@Ladsgroup Looks like this is ready for Phase 3, moving it on your behalf.

@Ladsgroup Looks like this is ready for Phase 3, moving it on your behalf.

Thanks!

I agree with T259771#6419957 and can confirm this from field experience. See here for a rough overview on WikiApiary.

@Bawolff @Kghbln Thanks for the information. Do you think upgrading from 1.16 directly is a must or it's okay if they upgrade to 1.31 first and then to 1.35 (and above)? 1.16 was released around ten years ago.

Regarding how to determine on what version we are upgrading from, that's really complicated, we don't store the version in the database (and even if we fix it, that's not going to affect old releases) and if someone is upgrading, they have removed the old file :/ We can introduce an option to upgrade.php to take the previous version and check against it and assume it's the current release if not set. Does that sound good?

Can we move forward with this?

We reviewed this in yesterday's TechCom meeting. The following questions have come up:

  • We need a way to detect and safely abort if someone tries to update from a version that is too old. So we have to somehow somewhere record in the database what version we are on. Maybe a simple revision number would be sufficient, but maybe we want something more fine grained. Maybe we can use update_log, maybe not. This needs a little more thought.
  • Who owns this policy? Presumably the Platform Engineering team would need to champion it.
  • We may want to discuss again what the limit is - two LTS releases, or maybe three? Do we need to reach out once more to potentially affected third parties?
  • For the RFC to go on last call, it should propose a specific wording of a policy.

We reviewed this in yesterday's TechCom meeting. The following questions have come up:

  • We need a way to detect and safely abort if someone tries to update from a version that is too old. So we have to somehow somewhere record in the database what version we are on. Maybe a simple revision number would be sufficient, but maybe we want something more fine grained. Maybe we can use update_log, maybe not. This needs a little more thought.

The problem is that it's impossible to implement, if we add something that leaves a marker in database, it's still can't be propagated to previous installations using EOL'ed releases in the wild. I have two ideas and have no preference which way to go :

  • Adding an optional argument to update.php and the maintainer providing that. So for example:
php maintenance/update.php --from-version=1.12

would abort on 1.36. Given the backward compatibility, I think if not applied, we should assume it's updating from within a release

  • Or, use the database implicit markers, for example table hitcounter was dropped in 1.26, if such table exists, we would abort the update. We need define such marker in every release.
  • Who owns this policy? Presumably the Platform Engineering team would need to champion it.
  • We may want to discuss again what the limit is - two LTS releases, or maybe three? Do we need to reach out once more to potentially affected third parties?

I asked several places, honestly I ran out of places to reach out to.

  • For the RFC to go on last call, it should propose a specific wording of a policy.

"MediaWiki doesn't support upgrading from versions predating two LTS releases (for example, if LTS releases are 1.35, 1.31, 1.27, ... upgrading from 1.30 and below to anything above 1.35 is not supported). If you want to upgrade from an old version, you can do it with jumps to multiple LTS releases (for example, for upgrading from 1.27 to 1.36, upgrade to 1.31 first, then 1.36)"

(Feel free to edit mercilessly)

We just discussed this in the Platform Engineering (PET) technical planning meeting. People in the meeting agreed with the general idea, though there was some discussion over whether it should be two LTS releases or three. PET can be the owner of the new policy, and champion this proposal.

Or, use the database implicit markers, for example table hitcounter was dropped in 1.26, if such table exists, we would abort the update. We need define such marker in every release.

According to the discussion in the meeting we just had, this would be the way to go for past versions. For future updates, we should write the major version to the database somewhere.

Provided this RFC is approved, PET could take on the work needed on the installer, but we are trying to be more focus in our work and avoid distractions. which means that new tasks coming up go to the end of the queue, and it may be a while until we get to them. @Ladsgroup, if you would write the code for deciding whether an update is supported or not, that would probably help to speed things along.

I think with this, this RFC is ready for phase 4 (tune).

We just discussed this in the Platform Engineering (PET) technical planning meeting. People in the meeting agreed with the general idea, though there was some discussion over whether it should be two LTS releases or three. PET can be the owner of the new policy, and champion this proposal.

I'm fine with any reasonable number, two or three, etc.

Or, use the database implicit markers, for example table hitcounter was dropped in 1.26, if such table exists, we would abort the update. We need define such marker in every release.

According to the discussion in the meeting we just had, this would be the way to go for past versions. For future updates, we should write the major version to the database somewhere.

Provided this RFC is approved, PET could take on the work needed on the installer, but we are trying to be more focus in our work and avoid distractions. which means that new tasks coming up go to the end of the queue, and it may be a while until we get to them. @Ladsgroup, if you would write the code for deciding whether an update is supported or not, that would probably help to speed things along.

I can write the code that take the implicit marker and abort the update (more in a rather hacky way I assume given that it'll be removed later) but I don't think I have the capacity to write the part about explicit markers.

I think with this, this RFC is ready for phase 4 (tune).

\o/

I can write the code that take the implicit marker and abort the update (more in a rather hacky way I assume given that it'll be removed later) but I don't think I have the capacity to write the part about explicit markers.

Excellent, thank you!

For explicit markers, perhaps it would be sufficient to write the major version into updatelog, e.g. ul_key = 'MW-1.36'.

I can write the code that take the implicit marker and abort the update (more in a rather hacky way I assume given that it'll be removed later) but I don't think I have the capacity to write the part about explicit markers.

Excellent, thank you!

For explicit markers, perhaps it would be sufficient to write the major version into updatelog, e.g. ul_key = 'MW-1.36'.

Won't that result in a new key each time? Would it be possible to use a single key, and just change the value?

What needs to be done to move this forward? The maintenance/archives/ directory now has 315 files (and counting). I make a patch to be merged for 1.36.

Change 648576 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Fail update if hitcounter exists

https://gerrit.wikimedia.org/r/648576

This is for three LTS releases ^

TBH I don't think the reasons in the task summary are convincing. If the archive files are annoying, we can just move them elsewhere. Column conflicts are indeed a problem, but this is a half-fix, and we should focus on a proper fix instead - just start using updatelog consistently for all schema changes, and backfill old values. (In other words, there should be a table which contains all the patchfiles MediaWiki ever had, and which of them had been applied, or "preapplied" in that the wiki was installed with a schema already including that patch.)

Anecdotally, I do see people trying to upgrade from 1.16 regularly. (Which of course would still be possible after this change, but more cumbersome...)

Probably the most inherently fragile part of upgrades is whether maintenance scripts work with an arbitrarily old schema (hello NameTableStore). But I don't think we'd avoid that problem by restricting to 2-3 LTS releases (ie. 4-6 years).

Anecdotally, I do see people trying to upgrade from 1.16 regularly. (Which of course would still be possible after this change, but more cumbersome...)

They still can, they'd just have to do it in two or three steps. This is pretty standard for upgrading old software.

Probably the most inherently fragile part of upgrades is whether maintenance scripts work with an arbitrarily old schema (hello NameTableStore). But I don't think we'd avoid that problem by restricting to 2-3 LTS releases (ie. 4-6 years).

This is indeed the big issue. And we can't entirely avoid this problem by limiting updates to 2 or 3 LTS releases, but we can greatly reduce it.

Basically, the question is whether we want to continue to commit to testing updates from all previous version, and investigating and fixing any issues that may arrise. In practice, we already don't do that. So we should not pretend we do. We should set clear expectations for release managers, and clear guarantees for users of the release.

Agreeing with Daniel. Also, I want to point out:

  • This policy helps with writing better tests for upgrades, currently we have some tests for sqlite upgrade but it's unclear what is the expectations, what versions it should test upgrading for, etc. as a result, the tests are pretty incomplete.
  • Upgrading in small jumps is recommended regardless of this RFC, it is a devops principle that deployments should be kept to small as possible. (lean principle? IIRC)

Personally, I do upgrades one version at a time, even if updating from LTS to LTS, to minimize the amount of breakage happening at once.

I think it's also good to encourage to upgrade more frequently if they want to do it in one step. On the other hand there is balance to be found to not actually deter upgrades by making them harder. The proposes 2-3 LTS seems sufficient for me.

I think this should also result in an update on one of the policy pages. Maybe https://www.mediawiki.org/wiki/MediaWiki_database_policy?

Krinkle renamed this task from RFC: Drop support for database upgrade older than two LTS releases to RFC: Drop support for older database upgrades.Dec 16 2020, 9:33 PM

There seems to be consensus that this is a good idea. Two LTS releases as originally proposed seem sufficient, given the fact that old releases can still be updated in multiple "hops".

As discussed in yesterday's TechCom meeting, this seems to be ready for Last Call, but since we already have two RFCs entering Last Call this week, this will have to wait until after January 6th.

Per Wednesday's TechCom meeting, this RFC is entering the Last Call period. If no concerns remain unaddressed by January 20, it will be adopted as policy as proposed:

MediaWiki will support upgrading from at least two LTS releases. For versions for which upgrading is not supported, the installer will fail.

But then support for upgrading to intermediate, unsupported, versions must be provided? If I recall correctly, the current guidance is to "ignore" support requests that involve upgrading to an unsupported version. Would a patch be issued if a bug exists in an old version that prevents upgrading to a newer one?

I don't think we provide any sort of official support on individual cases (comparing to what canonical provides in exchange for money for example). I don't think we ignore support requests for upgrades of unsupported version. I personally have helped a lot in those cases but at the end of the day it's "help" and not mandatory.

This RFC has been accepted with no objections raised during the last call period.

Bit of a stairway thought, but would people doing long-distance upgrades have to set up a server with some intermediary version of PHP for every step (and maybe MySQL although out requirements for that don't change often)? Looking at Compatibility#PHP we typically support a given PHP version for two LTS releases.

Bit of a stairway thought, but would people doing long-distance upgrades have to set up a server with some intermediary version of PHP for every step (and maybe MySQL although out requirements for that don't change often)? Looking at Compatibility#PHP we typically support a given PHP version for two LTS releases.

They might have to upgrade their version of PHP (and possibly MySQL) in multiple steps as well, yes. It's not great, but it's common practice anyway I believe.

Also, everything up to 1.35 can still be done in one step. So someone who, in four years, wants to upgrade from 1.29 to 1.43, will have to do two steps: one to 1.35, and one to 1.43. To upgrade to 1.50 in seven years, you'd go from 1.29 to 1.35 (2020), then to 1.43 (2024), and finally to 1.50 (2027). Not too terrible, I think.

Change 648576 merged by jenkins-bot:
[mediawiki/core@master] Fail update if bot_passwords doesn't exist

https://gerrit.wikimedia.org/r/648576

Change 661913 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Rework sqlite schema changes tests

https://gerrit.wikimedia.org/r/661913

Change 661913 merged by jenkins-bot:
[mediawiki/core@master] Rework sqlite schema changes tests

https://gerrit.wikimedia.org/r/661913

Ladsgroup claimed this task.

The policy itself is done. The patch to prevent updating from an old version is also merged and we are already dropping lots of old sql files. Calling this done. If you want to follow the progress of clean up, feel free to subscribe to T272199: Drop unsupported upgrade patch files and logic

I just noticed some cleanup/updates needed in RELEASE-NOTES

== Upgrading notes for 1.36 ==
Don't forget to always back up your database before upgrading!

See the file UPGRADE for more detailed upgrade instructions, including important
information when upgrading from versions prior to 1.11.

Change 677378 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@master] RELEASE-NOTES-1.36: Re-work UPGRADE reference

https://gerrit.wikimedia.org/r/677378

Change 677378 merged by jenkins-bot:

[mediawiki/core@master] RELEASE-NOTES-1.3{6,7}: Re-work UPGRADE reference

https://gerrit.wikimedia.org/r/677378

Change 677959 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@REL1_36] RELEASE-NOTES-1.3{6,7}: Re-work UPGRADE reference

https://gerrit.wikimedia.org/r/677959

Change 677959 merged by jenkins-bot:

[mediawiki/core@REL1_36] RELEASE-NOTES-1.36: Re-work UPGRADE reference

https://gerrit.wikimedia.org/r/677959

The task description says:

Affected components: Mediawiki core

Is that still accurate, or are extensions also affected? I think it would make sense to do this for extensions as well, at least those with the "release branches" compatibility policy.

It depends on the compataibility policy of the extension I assume. For example, if master of your extension is only compatible with master (most cases). It's safe to drop pre 1.27 as the update.php wouldn't work at all but if it's an edge case (I think language bundle is different, also wikibase) then it depends on the what versions it's compatible with.

Is this captured somewhere on-wiki? It would be nice to link it from pages like Compatibility, Version lifecycle, MediaWiki database policy.

Is this captured somewhere on-wiki? It would be nice to link it from pages like Compatibility, Version lifecycle, MediaWiki database policy.

Good point. I do not think so but when doing so Manual:Upgrading should also have a link I believe.

Is this captured somewhere on-wiki? It would be nice to link it from pages like Compatibility, Version lifecycle, MediaWiki database policy.

Good point. I do not think so but when doing so Manual:Upgrading should also have a link I believe.

It does have documentation about it: https://www.mediawiki.org/w/index.php?title=Manual:Upgrading&diff=4371568&oldid=4371567

It does have documentation about it: https://www.mediawiki.org/w/index.php?title=Manual:Upgrading&diff=4371568&oldid=4371567

Good. Did not see this in the heat of the action. :) Thanks. The spots Tgr mentioned do also make sense to me tough. Thus I now added the note to these pages, too. From your reply I am not sure how you think about it. Please revert or tweak at your linking.

Just as an aside, i feel like i've seen an increase in support desk questions because of this. In particular people will try to to do the upgrade in steps, but then they will try and use an old version of mw with php8 and have problems.

The error message

Can not upgrade from versions older than 1.29, please upgrade to that version or later first.

is confusing. I am upgrading to 1.29 or later!

Also, everything up to 1.35 can still be done in one step.

Based on (a misreading of) this, for an upgrade from 1.18, I went with 1.34 as the intermediate version and it worked fine. Could the error recommend specific versions to upgrade to? https://www.mediawiki.org/wiki/Compatibility mentions 1.27 and 1.35.

Just as an aside, i feel like i've seen an increase in support desk questions because of this. In particular people will try to to do the upgrade in steps, but then they will try and use an old version of mw with php8 and have problems.

Which versions are they using? PHP8 worked fine for me with 1.34, but I don't know how far back you can go before it breaks, and new breakages will probably emerge over time as things change in PHP. Probably a good reason for us to recommend the newest old version that we can.

The error message

Can not upgrade from versions older than 1.29, please upgrade to that version or later first.

is confusing. I am upgrading to 1.29 or later!

That is fair. What do you think it should be? Patches are welcome :)