Page MenuHomePhabricator

Migrate all old DB rows from windows-1252 to UTF-8 on dawiki
Closed, ResolvedPublic

Event Timeline

Change 383012 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[operations/mediawiki-config@master] Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:

https://gerrit.wikimedia.org/r/383012

Change 383012 abandoned by Zoranzoki21:
Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:

Reason:
I will abandon this change. Told me please to restore this patch, if it will be need. Sorry for much emails about this.

https://gerrit.wikimedia.org/r/383012

mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50;
+---------------------+----------+
| old_flags           | count(*) |
+---------------------+----------+
|                     |     3785 |
| error               |        2 |
| external,gzip       |       49 |
| external,object     |       36 |
| external,utf-8      |  1614469 |
| external,utf8       |   336780 |
| gzip                |     1094 |
| object              |    29477 |
| utf-8,gzip          |    39973 |
| utf-8,gzip,external |  9379727 |
+---------------------+----------+
10 rows in set (24.092 sec)

mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select max(old_id) from text where not old_flags like '%external%';
+-------------+
| max(old_id) |
+-------------+
|     9329492 |
+-------------+
1 row in set (4.903 sec)

Done:

mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50;
+---------------------+----------+
| old_flags           | count(*) |
+---------------------+----------+
| error               |        2 |
| external,gzip       |       49 |
| external,object     |       36 |
| external,utf-8      |  1614469 |
| external,utf8       |   336780 |
| gzip,utf-8,external |     1094 |
| utf-8,gzip,external |  9452968 |
+---------------------+----------+
7 rows in set (20.480 sec)

Eighty rows are still on the legacy encoding (due to T282734#8891439) so we can't remove it right now.

Ladsgroup added a project: DBA.
Ladsgroup moved this task from Triage to In progress on the DBA board.

It's all cleaned except 36 rows:

mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50;
+---------------------+----------+
| old_flags           | count(*) |
+---------------------+----------+
| error               |        2 |
| external,object     |       36 |
| external,utf-8      |  1614469 |
| external,utf8       |   336780 |
| gzip,utf-8,external |     1094 |
| utf-8,gzip,external |  9458139 |
+---------------------+----------+
6 rows in set (21.522 sec)

Because the script can't unserialize the entry. And it gets even worse. It's because the entry is not an object, it's the raw and actual content of the revision incorrectly marked as object

For future reference:

mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select * from text where old_flags = 'external,object' limit 50;
+---------+----------------------+-----------------+
| old_id  | old_text             | old_flags       |
+---------+----------------------+-----------------+
| 9333425 | DB://cluster1/62685  | external,object |
| 9333429 | DB://cluster1/162824 | external,object |
| 9333447 | DB://cluster1/125529 | external,object |
| 9333450 | DB://cluster1/5202   | external,object |
| 9333457 | DB://cluster1/5126   | external,object |
| 9333462 | DB://cluster1/51897  | external,object |
| 9333503 | DB://cluster1/38986  | external,object |
| 9333557 | DB://cluster1/5213   | external,object |
| 9333576 | DB://cluster1/5218   | external,object |
| 9333611 | DB://cluster1/103020 | external,object |
| 9333646 | DB://cluster1/23703  | external,object |
| 9333699 | DB://cluster1/108291 | external,object |
| 9333742 | DB://cluster1/20926  | external,object |
| 9333768 | DB://cluster1/159734 | external,object |
| 9333804 | DB://cluster1/25686  | external,object |
| 9333811 | DB://cluster1/12369  | external,object |
| 9333835 | DB://cluster1/5142   | external,object |
| 9333862 | DB://cluster1/169895 | external,object |
| 9334127 | DB://cluster1/104136 | external,object |
| 9334132 | DB://cluster1/104128 | external,object |
| 9334137 | DB://cluster1/150612 | external,object |
| 9334207 | DB://cluster1/126056 | external,object |
| 9334222 | DB://cluster1/97486  | external,object |
| 9334227 | DB://cluster1/184154 | external,object |
| 9334232 | DB://cluster1/184155 | external,object |
| 9334259 | DB://cluster1/40760  | external,object |
| 9334266 | DB://cluster1/155353 | external,object |
| 9334272 | DB://cluster1/24287  | external,object |
| 9334355 | DB://cluster1/93529  | external,object |
| 9334401 | DB://cluster1/154297 | external,object |
| 9334451 | DB://cluster1/68421  | external,object |
| 9334462 | DB://cluster1/68417  | external,object |
| 9334469 | DB://cluster1/68424  | external,object |
| 9334473 | DB://cluster1/68416  | external,object |
| 9334561 | DB://cluster1/126922 | external,object |
| 9334568 | DB://cluster1/55519  | external,object |
+---------+----------------------+-----------------+
36 rows in set (4.851 sec)

I think I fixed it but I can test it because all of these revisions are deleted. I try to load them up in eval.php
(these are rev ids)

9477333
9477337
9477355
9477358
9477365
9477370
9477411
9477465
9477484
9477519
9477554
9477607
9477650
9477676
9477712
9477719
9477743
9477770
9478035
9478040
9478045
9478115
9478130
9478135
9478140
9478167
9478174
9478180
9478263
9478309
9478359
9478370
9478377
9478381
9478469
9478476

+------------------+

Yup,
It's fixed:

> $res = MediaWiki\MediaWikiServices::getInstance()->getRevisionRenderer()->getRenderedRevision( MediaWiki\MediaWikiServices::getInstance()->getArchivedRevisionLookup()->getArchivedRevisionRecord( Title::newFromDBkey( 'Reparere' ), 9477333 ) );


> var_dump( $res->getRevisionParserOutput()->getRawText() );
<redacted>

Officially now dawiki is legacy encoding free

Change 928516 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Remove svwiktionary, svwiki and dawiki from legacy encoding

https://gerrit.wikimedia.org/r/928516

Change 928516 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove svwiktionary, svwiki and dawiki from legacy encoding

https://gerrit.wikimedia.org/r/928516

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:49:43Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:51:26Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:58:56Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] (duration: 09m 13s)

Ladsgroup moved this task from In progress to Done on the DBA board.