| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Remove svwiktionary, svwiki and dawiki from legacy encoding | operations/mediawiki-config | master | +0 -4 |
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T128149 Remove wgLegacyEncoding feature of Revision/BlobStore | |||
| Resolved | Ladsgroup | T128150 Stop needing to use wgLegacyEncoding in Wikimedia cluster production | |||
| Resolved | Ladsgroup | T128152 Migrate all old DB rows from windows-1252 to UTF-8 on dawiki |
Event Timeline
Change 383012 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[operations/mediawiki-config@master] Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:
Change 383012 abandoned by Zoranzoki21:
Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:
Reason:
I will abandon this change. Told me please to restore this patch, if it will be need. Sorry for much emails about this.
mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50; +---------------------+----------+ | old_flags | count(*) | +---------------------+----------+ | | 3785 | | error | 2 | | external,gzip | 49 | | external,object | 36 | | external,utf-8 | 1614469 | | external,utf8 | 336780 | | gzip | 1094 | | object | 29477 | | utf-8,gzip | 39973 | | utf-8,gzip,external | 9379727 | +---------------------+----------+ 10 rows in set (24.092 sec) mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select max(old_id) from text where not old_flags like '%external%'; +-------------+ | max(old_id) | +-------------+ | 9329492 | +-------------+ 1 row in set (4.903 sec)
Done:
mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50; +---------------------+----------+ | old_flags | count(*) | +---------------------+----------+ | error | 2 | | external,gzip | 49 | | external,object | 36 | | external,utf-8 | 1614469 | | external,utf8 | 336780 | | gzip,utf-8,external | 1094 | | utf-8,gzip,external | 9452968 | +---------------------+----------+ 7 rows in set (20.480 sec)
Eighty rows are still on the legacy encoding (due to T282734#8891439) so we can't remove it right now.
It's all cleaned except 36 rows:
mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50; +---------------------+----------+ | old_flags | count(*) | +---------------------+----------+ | error | 2 | | external,object | 36 | | external,utf-8 | 1614469 | | external,utf8 | 336780 | | gzip,utf-8,external | 1094 | | utf-8,gzip,external | 9458139 | +---------------------+----------+ 6 rows in set (21.522 sec)
Because the script can't unserialize the entry. And it gets even worse. It's because the entry is not an object, it's the raw and actual content of the revision incorrectly marked as object
For future reference:
mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select * from text where old_flags = 'external,object' limit 50; +---------+----------------------+-----------------+ | old_id | old_text | old_flags | +---------+----------------------+-----------------+ | 9333425 | DB://cluster1/62685 | external,object | | 9333429 | DB://cluster1/162824 | external,object | | 9333447 | DB://cluster1/125529 | external,object | | 9333450 | DB://cluster1/5202 | external,object | | 9333457 | DB://cluster1/5126 | external,object | | 9333462 | DB://cluster1/51897 | external,object | | 9333503 | DB://cluster1/38986 | external,object | | 9333557 | DB://cluster1/5213 | external,object | | 9333576 | DB://cluster1/5218 | external,object | | 9333611 | DB://cluster1/103020 | external,object | | 9333646 | DB://cluster1/23703 | external,object | | 9333699 | DB://cluster1/108291 | external,object | | 9333742 | DB://cluster1/20926 | external,object | | 9333768 | DB://cluster1/159734 | external,object | | 9333804 | DB://cluster1/25686 | external,object | | 9333811 | DB://cluster1/12369 | external,object | | 9333835 | DB://cluster1/5142 | external,object | | 9333862 | DB://cluster1/169895 | external,object | | 9334127 | DB://cluster1/104136 | external,object | | 9334132 | DB://cluster1/104128 | external,object | | 9334137 | DB://cluster1/150612 | external,object | | 9334207 | DB://cluster1/126056 | external,object | | 9334222 | DB://cluster1/97486 | external,object | | 9334227 | DB://cluster1/184154 | external,object | | 9334232 | DB://cluster1/184155 | external,object | | 9334259 | DB://cluster1/40760 | external,object | | 9334266 | DB://cluster1/155353 | external,object | | 9334272 | DB://cluster1/24287 | external,object | | 9334355 | DB://cluster1/93529 | external,object | | 9334401 | DB://cluster1/154297 | external,object | | 9334451 | DB://cluster1/68421 | external,object | | 9334462 | DB://cluster1/68417 | external,object | | 9334469 | DB://cluster1/68424 | external,object | | 9334473 | DB://cluster1/68416 | external,object | | 9334561 | DB://cluster1/126922 | external,object | | 9334568 | DB://cluster1/55519 | external,object | +---------+----------------------+-----------------+ 36 rows in set (4.851 sec)
I think I fixed it but I can test it because all of these revisions are deleted. I try to load them up in eval.php
(these are rev ids)
| 9477333 |
| 9477337 |
| 9477355 |
| 9477358 |
| 9477365 |
| 9477370 |
| 9477411 |
| 9477465 |
| 9477484 |
| 9477519 |
| 9477554 |
| 9477607 |
| 9477650 |
| 9477676 |
| 9477712 |
| 9477719 |
| 9477743 |
| 9477770 |
| 9478035 |
| 9478040 |
| 9478045 |
| 9478115 |
| 9478130 |
| 9478135 |
| 9478140 |
| 9478167 |
| 9478174 |
| 9478180 |
| 9478263 |
| 9478309 |
| 9478359 |
| 9478370 |
| 9478377 |
| 9478381 |
| 9478469 |
| 9478476 |
+------------------+
Yup,
It's fixed:
> $res = MediaWiki\MediaWikiServices::getInstance()->getRevisionRenderer()->getRenderedRevision( MediaWiki\MediaWikiServices::getInstance()->getArchivedRevisionLookup()->getArchivedRevisionRecord( Title::newFromDBkey( 'Reparere' ), 9477333 ) ); > var_dump( $res->getRevisionParserOutput()->getRawText() ); <redacted>
Officially now dawiki is legacy encoding free
Change 928516 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/mediawiki-config@master] Remove svwiktionary, svwiki and dawiki from legacy encoding
Change 928516 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove svwiktionary, svwiki and dawiki from legacy encoding
Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:49:43Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]]
Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:51:26Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:58:56Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] (duration: 09m 13s)