Page MenuHomePhabricator

Migrate all old DB rows from windows-1252 to UTF-8 on svwiktionary
Closed, ResolvedPublic

Event Timeline

Change 383012 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[operations/mediawiki-config@master] Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:

Change 383012 abandoned by Zoranzoki21:
Migrate all old DB rows from windows-1252 to UTF-8 on several wikis:

I will abandon this change. Told me please to restore this patch, if it will be need. Sorry for much emails about this.

I'll run this tomorrow:

ladsgroup@mwmaint1002:~$ mwscript maintenance/storage/moveToExternal.php --wiki=svwiktionary --end 31287 DB cluster27

The end_id cam from this:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select max(old_id) from text where not old_flags like '%external%';
| max(old_id) |
|       31287 |
1 row in set (1.623 sec)


| old_flags  | count(*) |
|            |     6756 |
| gzip       |     2506 |
| object     |     8202 |
| utf-8,gzip |    13750 |

Some test revs to make sure things are alright:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select old_id from text where old_flags = '' limit 5;
| old_id |
|    235 |
|    236 |
|    237 |
|    238 |
|    239 |
5 rows in set (0.001 sec)

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select * from content where content_address in ('tt:235', 'tt:236', 'tt:238', 'tt:239');
| content_id | content_size | content_sha1                    | content_model | content_address |
|       7500 |         7543 | f01ffbkivurqnf6rmuhsl0c6m2wfa1o |             1 | tt:235          |
|       7501 |           32 | qku8x3tytyyegqwbyawrs9ecs9hgjos |             1 | tt:236          |
|       7503 |          113 | 71lez11a2u9tau2ij6izzsdhdkhq93p |             1 | tt:238          |
|       7504 |          111 | 8sz749bmctynpl48hr67vripwjgvesj |             1 | tt:239          |
4 rows in set (1.002 sec)

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select * from slots where slot_content_id in (7500, 7501, 7503, 7504);
| slot_revision_id | slot_role_id | slot_content_id | slot_origin |
|              235 |            1 |            7500 |         235 |
|              236 |            1 |            7501 |         236 |
|              238 |            1 |            7503 |         238 |
|              239 |            1 |            7504 |         239 |
4 rows in set (2.512 sec)

Which is confirmed by vs.:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select * from text where old_id = 235;
| old_id | old_text                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | old_flags |
|    235 | <center>
[[Idag]] �r det [[{{CURRENTDAYNAME}}]], den {{CURRENTDAY}} [[{{CURRENTMONTHNAME}}]] {{CURRENTYEAR}} ([[UTC]])<br>P� ISO 8601-format: {{CURRENTYEAR}}-{{CURRENTMONTH}}-{{CURRENTDAY}} '''T''' {{CURRENTTIME}} <br>

'''Wiktionary''' �r ett projekt, �ppet f�r alla att delta i, med m�ls�ttningen att skapa en ordlista f�r alla spr�k. All text i Wiktionary publiceras under '''[[GNU FDL]]''', vilket betyder att du f�r anv�nda texterna fritt, s� l�nge som de f�rblir fria under samma licens. 

Wiktionary �r l�tt att bidra till: pr�va dig g�rna fram p� [[Wiktionary:Sandl�dan|testsidan]] tills du k�nner att du vet hur man g�r. 

Den svenska wiktionaryn startade den 2:a maj 2004, och har just nu {{NUMBEROFARTICLES}} artiklar. (R�knaren verkar f.n. vara ur funktion...)

De [[Special:Newpages|senaste artiklarna]].

Alla artiklar som b�rjar med: 
[ A]
[ B]
[ C] 
[ D]
[ E]
[ F]
[ G]
[ H]
[ I]
[ J] 
[ K]
[ L]
[ M]
[ N]
[ O]
[ P]
[ Q]
[ R]
[ S]
[ T]
[ U]
[ V]
[ W]
[ X]
[ Y]
[ Z]
[� �]
[� �]
[� �]

==Wiktionary p� andra spr�k==

[ Engelska (English)] - [ Finska (Suomi)] - [ Franska (Fran�ais)] - 
[ Holl�ndska] - [ Polska] - [ Rum�nska (Rom�n&#259;)] - [ Ryska (&#1056;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081;)] - [ Tyska (Deutsch)] - [ Ungerska]

<!-- These links are not needed as long as the other language situation is not settled or at least explained somewhere, it is only confusing: 

<small>[[Wiktionary:Multilingual coordination|Wiktionary Language List]] - [ (Araby) coming soon] - [ Aymara (coming soon] - [ (Balgarski) (coming soon)] - [ Catal&agrave; (coming soon)] - [ Corsu] (WWW coming soon) - [[Hlavn� strana|&#268esky]] [ &#268esky coming soon] - [[Hauptseite|German (Deutsch)]]  [ Deutsch]  (WWW coming soon) - [[&#922;&#973;&#961;&#953;&#945; &#931;&#949;&#955;&#943;&#948;&#945;|(Ellenika)]] [ (Ellenika)] (WWW coming soon)] - [[Portada|Espa&ntilde;ol]] [ Espa&ntilde;ol] (WWW coming soon) - 
[[Cxefpagxo|Esperanto]] [ Esperanto] (WWW coming soon) - [ Eesti (coming soon)] - [ Euskara (coming soon)] - [ Farsi] (WWW coming soon) - [ Suomeksi] (WWW coming soon) - [[Accueil|Fran&ccedil;ais]] [ Fran&ccedil;ais] (WWW coming soon) - [] (WWW coming soon) - [ Gaelige (coming soon)] - [ Galego (coming soon)] - [ Guaran&iacute; (coming soon)] - [ Gujarati (coming soon)] - [ Hrvatski] (WWW coming soon) - [ Interlingua] (coming soon) - [ Bahasa Indonesia (coming soon)] - [ &Iacute;slenska (coming soon)] - [ Italiano] (website coming soon) - 
[[&#12513;&#12452;&#12531;&#12506;&#12540;&#12472;|&#26085;&#26412;&#35486; (Nihongo)]] [ (Nihongo) (Japanese) coming soon] - [ (Kartuli-ena) (coming soon)] - [ (Kannada) coming soon] - [ (Hangukeo)] (website coming soon) - [ Kurd&iacute;] - [ Latina] (coming soon) - [ Malayalam] (WWW coming soon) - [ Marathi] (WWW coming soon) - [ Bahasa Melayu] (WWW coming soon) - 
[[onthaalpagina|Nederlands]] [ (coming soon)] - [ Norsk (coming soon)] - [ Occitan (coming soon)] - [ Punjabi (coming soon)] - [ Portugu&ecirc;s] (coming soon) - [[Pagina principal&#259;|Rom�n&#259;]]  [ Rom�n&#259;] (WWW coming soon) - [ (Russkiy)] (coming soon) - [ (Samskrta)] (coming soon) - [ Slovensko] (website coming soon) - [ Svenska (coming soon)] - [ Thai (coming soon)] - [ Turk&ccedil;e] (website coming soon)] -  [ (Urdu)] (coming soon) - [[Trang Ch�nh|Ti&#7871;ng Vi&#7879;t]] [ Ti&#7871;ng Vi&#7879;t (coming soon)] - [[&#39318;&#39029;|&#20013;&#25991; (Zhongwen)]] [] (coming soon) - [[Wikipedia:Multilingual Statistics|Statistics]]</small>

==Andra Wikimediaprojekt==

<small>[ Meta-Wikipedia] - [ Wikipedia] - '''Wiktionary''' - [ Wikibooks] - [ Wikiquote] - [ Wikisource]<!-- Saved for future reactivation: - [ (coming soon) Wikipediatlas (coming soon)]</small>

== Se �ven ==
* [[Wiktionary:Bybrunnen]]
* [[Wiktionary:Beg�ran om administrat�rsskap]]
* [[Wiktionary:Hj�lp]]
* [[Wiktionary:FAQ]]
* [[Wiktionary:Sidor som b�r raderas]]
* [[Wiktionary:Svenskt spr�kindex]]
* [[Wiktionary:Statistik]] |           |
1 row in set (0.001 sec)


grafik.png (745×1 px, 133 KB)

Ran the script on these revisions:

ladsgroup@mwmaint1002:~$ mwscript maintenance/storage/moveToExternal.php --wiki=svwiktionary --start 234 --end 237 DB cluster27
Moving text rows from 234 to 237 to external storage
oldid=234, moved=0

The revision works (the link given) and it's moved:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select * from text where old_id = 235;
| old_id | old_text              | old_flags     |
|    235 | DB://cluster27/264386 | gzip,external |
1 row in set (0.001 sec)

Well, it didn't fix the encoding to utf-8. Here is the current ids:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select * from text order by old_id desc limit 5;
| old_id  | old_text              | old_flags           |
| 3907731 | DB://cluster27/265417 | utf-8,gzip,external |
| 3907730 | DB://cluster27/265416 | utf-8,gzip,external |
| 3907729 | DB://cluster27/265415 | utf-8,gzip,external |
| 3907728 | DB://cluster26/264858 | utf-8,gzip,external |
| 3907727 | DB://cluster27/265414 | utf-8,gzip,external |
5 rows in set (0.001 sec)

le sigh

Running it with --iconv would fix some but not the ones that are external but with legacy encoding. Fixing it is not that hard though.

Ran it, only 1K rows left:

mysql:research@s3-analytics-replica.eqiad.wmnet [svwiktionary]> select old_flags, count(*) from text group by old_flags limit 50;
| old_flags           | count(*) |
| external,utf-8      |   315803 |
| gzip,external       |      991 |
| gzip,utf-8,external |     2493 |
| utf-8,gzip,external |  3585718 |
4 rows in set (7.268 sec)

Mentioned in SAL (#wikimedia-operations) [2023-06-07T12:46:43Z] <Amir1> mwscript maintenance/storage/moveToExternal.php --iconv DB cluster27 on dawiktionary and svwiktionary (T128155 and T128156)

Ladsgroup added a project: DBA.
Ladsgroup moved this task from Triage to In progress on the DBA board.

Via the script I made in T282734, I moved the 1K lines. An undo sql file is prepared just in case.

Change 928516 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Remove svwiktionary from legacy encoding

Change 928516 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove svwiktionary, svwiki and dawiki from legacy encoding

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:49:43Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:51:26Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-08T13:58:56Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:928516|Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)]] (duration: 09m 13s)

Ladsgroup moved this task from In progress to Done on the DBA board.