Since version 1.5, MediaWiki stores all data in Unicode. Before that, the encoding was configurable; $wgLegacyEncoding allows current MediaWiki to work with old non-Unicode database values. This is one of our oldest pieces of technical debt. Retire it and automatically convert database rows on upgrade instead.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
[DNM] Drop wgLegacyEncoding entirely | mediawiki/core | master | +25 -418 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T128149 Remove wgLegacyEncoding feature of Revision/BlobStore | |||
Resolved | Ladsgroup | T128150 Stop needing to use wgLegacyEncoding in Wikimedia cluster production | |||
Resolved | Ladsgroup | T128151 Migrate all old DB rows from windows-1252 to UTF-8 on enwiki | |||
Resolved | Ladsgroup | T128152 Migrate all old DB rows from windows-1252 to UTF-8 on dawiki | |||
Resolved | Ladsgroup | T128153 Migrate all old DB rows from windows-1252 to UTF-8 on svwiki | |||
Resolved | Ladsgroup | T128154 Migrate all old DB rows from windows-1252 to UTF-8 on nlwiki | |||
Resolved | Ladsgroup | T128155 Migrate all old DB rows from windows-1252 to UTF-8 on dawiktionary | |||
Resolved | Ladsgroup | T128156 Migrate all old DB rows from windows-1252 to UTF-8 on svwiktionary | |||
Open | None | T282734 Provide a maintenance script to migrate old revisions that use wgLegacyEncoding to UTF-8 | |||
Open | None | T340174 moveToExternal and fixLegacyEncoding scripts are missing some checks for false return values |
Event Timeline
How would attempts to load data from legacy text table entries be handled if the variable were removed? Simply erroring out would feel like scolding the user for having a wiki around for a long time and not being personally aware of internal implementation details.
(This is something that will likely hit very few wikis, but the ones it does hit would lose access to some of their data.)
Could we have a maint. script that converts the data properly and is automatically run by update.php?
Could do yeah, though since text.old_flags isn't indexed it may require a potentially slow scan through the table to look for affected rows... which is what I tried to avoid back in the day by doing the on-the-fly encoding conversion in the first place :)
Note that $wgLegacyEncoding also affects interpretation of really old passwords, which we can't convert until the user attempts to log in because we only store the hash.
"Really old" apparently means they haven't changed their password since 2004 (the check for old encodings was added in rMW8f147fa900d1: committing Hendrik Brummermann's checkPassword() patch, plus some modifications…) and haven't logged in since 2013 (conversion on login was added in rMW95a8974c6bda: Added password hashing API).
This proposal is selected for the Developer-Wishlist voting round and will be added to a MediaWiki page very soon. To the subscribers, or proposer of this task: please help modify the task description: add a brief summary (10-12 lines) of the problem that this proposal raises, topics discussed in the comments, and a proposed solution (if there is any yet). Remember to add a header with a title "Description," to your content. Please do so before February 5th, 12:00 pm UTC.
If there is even any user left with that criteria, I'm happy with them needing to reset their password with an email and if they don't have an email, a sysadmin can take care of it.
Change 963103 had a related patch set uploaded (by Jforrester; author: Jforrester):
[mediawiki/core@master] [DNM] Drop wgLegacyEncoding entirely