Page MenuHomePhabricator

MCR schema migration stage 4: Migrate External Store URLs (wmf production)
Closed, ResolvedPublic

Description

We should migrate data stored in the External Store away from the text table: The External Store URL that is contained in the text blob can be written to the cont_address field (possibly with a prefix, to be decided, see External Store Integration). Then the corresponding rows can be deleted from the text table.

Progress:

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8

The scripts are running in screens on mwmaint2002. Since we are deleting data from the text table, we are generating a dump of the data, where the id of a text table row is associated with the external storage address, in order to provide a backup in case of something going wrong. After a wiki migrated successfully the dump is gziped and moved to stat1009 (since that host has far more disk space).

In case of data corruption: You can find the dump for the text table of abcwiki at stat1009.eqiad.wmnet:/home/zabe/text_table_dump_compressed/abcwiki.gz.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+0 -5
mediawiki/coremaster+3 -50
operations/mediawiki-configmaster+0 -1
operations/mediawiki-configmaster+0 -2
operations/mediawiki-configmaster+0 -5
operations/mediawiki-configmaster+4 -0
mediawiki/extensions/WikimediaMaintenancemaster+104 -0
operations/mediawiki-configmaster+7 -2
mediawiki/coremaster+212 -0
operations/mediawiki-configmaster+0 -1
operations/mediawiki-configmaster+0 -1
operations/mediawiki-configmaster+1 -0
operations/mediawiki-configmaster+0 -8
operations/mediawiki-configmaster+0 -15
operations/mediawiki-configmaster+1 -20
operations/mediawiki-configmaster+1 -98
operations/mediawiki-configmaster+1 -7
operations/mediawiki-configmaster+148 -4
mediawiki/corewmf/1.43.0-wmf.24+50 -3
mediawiki/coremaster+50 -3
operations/mediawiki-configmaster+8 -0
mediawiki/extensions/WikimediaMaintenancemaster+11 -1
mediawiki/extensions/WikimediaMaintenancewmf/1.43.0-wmf.21+22 -0
mediawiki/extensions/WikimediaMaintenancewmf/1.43.0-wmf.21+20 -5
mediawiki/extensions/WikimediaMaintenancewmf/1.43.0-wmf.22+22 -0
mediawiki/extensions/WikimediaMaintenancewmf/1.43.0-wmf.22+20 -5
mediawiki/extensions/WikimediaMaintenancemaster+22 -0
mediawiki/extensions/WikimediaMaintenancemaster+20 -5
mediawiki/extensions/WikimediaMaintenancemaster+36 -2
mediawiki/extensions/WikimediaMaintenancemaster+167 -0
mediawiki/extensions/AbuseFiltermaster+24 -92
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
StalledNone
StalledNone
ResolvedZabe
ResolvedNone
Resolvedtstarling
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
Resolveddaniel
ResolvedAnomie
ResolvedAnomie
Resolved Marostegui
ResolvedAnomie
Resolvedtstarling
ResolvedAnomie
ResolvedAnomie
Resolveddaniel
Resolveddaniel
ResolvedAnomie
ResolvedAnomie
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
ResolvedBPirkle
ResolvedNone
ResolvedLadsgroup

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We probably should make the script a bit idempotent. I wanted to run it on dewiki to make sure it's properly cleaned up and it basically took forever.

Change #1106929 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/WikimediaMaintenance@master] Add script to delete obsolote text row tables

https://gerrit.wikimedia.org/r/1106929

Change #1114060 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] Increase revision-slots cache expiry back to default for most wikis

https://gerrit.wikimedia.org/r/1114060

Change #1114060 merged by jenkins-bot:

[operations/mediawiki-config@master] Increase revision-slots cache expiry back to default for most wikis

https://gerrit.wikimedia.org/r/1114060

Mentioned in SAL (#wikimedia-operations) [2025-01-27T14:57:26Z] <zabe@deploy2002> sync-world aborted: T384614 T183490 (duration: 17m 07s)

Current progress on remaining sections.

zabe@mwmaint2002:~/text_table_dump$ wc -l *
   535915916 commonswiki
   447725014 enwiki
   776484342 wikidatawiki
  1760125272 total
zabe@mwmaint2002:~/text_table_dump$

This means that enwiki is about 37.0 % done, commonswiki 61.9 % and wikidatawiki 36.2%.

Thanks! The commons one is the most important to me since the file and categorylinks migration will take a long time there and I want to start it ASAP. The others should be fine for now.

Change #1106929 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] Add script to delete obsolote text table rows

https://gerrit.wikimedia.org/r/1106929

Mentioned in SAL (#wikimedia-operations) [2025-02-11T00:59:06Z] <zabe> zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php test2wiki --delete /home/zabe/text_table_cleanup/test2wiki # T183490

Mentioned in SAL (#wikimedia-operations) [2025-02-12T00:12:24Z] <zabe> zabe@mwmaint2002:~$ cat /srv/mediawiki-staging/dblists/group1.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490

Mentioned in SAL (#wikimedia-operations) [2025-02-12T00:12:24Z] <zabe> zabe@mwmaint2002:~$ cat /srv/mediawiki-staging/dblists/group1.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490

To be clear. I obviously skipped commonswiki and wikidatawiki here since those are still running the first stage of this migration.

Until dewiktionary there was and is no issue, then T386162 appeared. I stopped the script when it was about enwikibooks.

Until dewiktionary there was and is no issue, then T386162 appeared. I stopped the script when it was about enwikibooks.

Let's run a check to see if any other wiki might have been missed during the first run.

Running select count(*) from content where content_address like 'tt%'; on group1 and group2 shows that we apparently missed dewiktionary, diqwiki and ttwiki.

Change #1119207 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki

https://gerrit.wikimedia.org/r/1119207

Change #1119207 merged by jenkins-bot:

[operations/mediawiki-config@master] Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki

https://gerrit.wikimedia.org/r/1119207

Mentioned in SAL (#wikimedia-operations) [2025-02-13T00:04:43Z] <zabe@deploy2002> Started scap sync-world: Backport for [[gerrit:1119207|Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490)]]

Mentioned in SAL (#wikimedia-operations) [2025-02-13T00:07:41Z] <zabe@deploy2002> zabe: Backport for [[gerrit:1119207|Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-02-13T00:15:22Z] <zabe@deploy2002> Finished scap sync-world: Backport for [[gerrit:1119207|Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490)]] (duration: 10m 39s)

Mentioned in SAL (#wikimedia-operations) [2025-02-13T22:01:58Z] <zabe> zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php diqwiki --skip /home/zabe/text_table_cleanup/diqwiki --dump /home/zabe/text_table_dump/diqwiki --sleep 0.5 --start 318769 # T183490

Mentioned in SAL (#wikimedia-operations) [2025-02-13T22:38:52Z] <zabe> zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php ttwiki --skip /home/zabe/text_table_cleanup/ttwiki --dump /home/zabe/text_table_dump/ttwiki --sleep 0.5 --start 867501 # T183490

diqwiki is also s3 and sits between dewiktionary and enwikibooks. So it probably have also messed up but it's not showing up I guess because it's not really visited? How big is the deletion file for diqwiki?

diqwiki is also s3 and sits between dewiktionary and enwikibooks. So it probably have also messed up but it's not showing up I guess because it's not really visited? How big is the deletion file for diqwiki?

I only ran the stage 2 script on group1 wikis yet and diqwiki is group2, so it shuold be fine.

diqwiki is also s3 and sits between dewiktionary and enwikibooks. So it probably have also messed up but it's not showing up I guess because it's not really visited? How big is the deletion file for diqwiki?

I only ran the stage 2 script on group1 wikis yet and diqwiki is group2, so it shuold be fine.

Looks good (I did run the stage 1 script in the meantime).

wikiadmin2023@10.192.16.41(diqwiki)> select count(*) from content where content_address like 'tt%';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.627 sec)

wikiadmin2023@10.192.16.41(diqwiki)>

Mentioned in SAL (#wikimedia-operations) [2025-02-17T17:25:02Z] <zabe> zabe@mwmaint2002:~$ cat /home/zabe/group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490

Mentioned in SAL (#wikimedia-operations) [2025-02-17T17:25:02Z] <zabe> zabe@mwmaint2002:~$ cat /home/zabe/group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490

This finished without issues (I obviously skipped enwiki).

Change #1120689 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] Increase revision-slots cache expiry back to default for 3 wikis

https://gerrit.wikimedia.org/r/1120689

Change #1120689 merged by jenkins-bot:

[operations/mediawiki-config@master] Increase revision-slots cache expiry back to default for 3 wikis

https://gerrit.wikimedia.org/r/1120689

Mentioned in SAL (#wikimedia-operations) [2025-02-19T01:56:56Z] <zabe@deploy2002> Started scap sync-world: Backport for [[gerrit:1120690|Activate satwiktionary (T386619)]], [[gerrit:1120689|Increase revision-slots cache expiry back to default for 3 wikis (T183490)]]

Mentioned in SAL (#wikimedia-operations) [2025-02-19T01:59:56Z] <zabe@deploy2002> zabe: Backport for [[gerrit:1120690|Activate satwiktionary (T386619)]], [[gerrit:1120689|Increase revision-slots cache expiry back to default for 3 wikis (T183490)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-02-19T02:07:52Z] <zabe@deploy2002> Finished scap sync-world: Backport for [[gerrit:1120690|Activate satwiktionary (T386619)]], [[gerrit:1120689|Increase revision-slots cache expiry back to default for 3 wikis (T183490)]] (duration: 10m 55s)

commonswiki ~92.5%
enwiki ~56.3%
wikidatawiki ~47.7%

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from content where content_address like 'tt%';
+----------+
| count(*) |
+----------+
|      282 |
+----------+
1 row in set (17 min 39.538 sec)

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]>

Mentioned in SAL (#wikimedia-operations) [2025-03-06T00:33:47Z] <zabe> zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php commonswiki --delete /home/zabe/text_table_cleanup/commonswiki --sleep 0.5 # T183490

enwiki ~70.7%
wikidatawiki ~57.5%

Moved the scripts that are still running (enwiki and wikidatawiki) back to mwmaint1002 due to the dc switchover.

enwiki ~79.6%
wikidatawiki ~62.6%

Change #1139577 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] enwiki and commons: Increase revision-slots cache expiry again

https://gerrit.wikimedia.org/r/1139577

Change #1139577 merged by jenkins-bot:

[operations/mediawiki-config@master] enwiki and commons: Increase revision-slots cache expiry again

https://gerrit.wikimedia.org/r/1139577

Mentioned in SAL (#wikimedia-operations) [2025-04-29T23:51:53Z] <zabe@deploy1003> Started scap sync-world: Backport for [[gerrit:1139577|enwiki and commons: Increase revision-slots cache expiry again (T183490)]]

Mentioned in SAL (#wikimedia-operations) [2025-04-29T23:58:43Z] <zabe@deploy1003> zabe: Backport for [[gerrit:1139577|enwiki and commons: Increase revision-slots cache expiry again (T183490)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-04-30T00:05:39Z] <zabe@deploy1003> Finished scap sync-world: Backport for [[gerrit:1139577|enwiki and commons: Increase revision-slots cache expiry again (T183490)]] (duration: 13m 45s)

Stage 2 still has to run on s1, but that should take less than a day (I'll do it after stage 1 of T381599 is finished on s1).

On s8, about 500 million rows still have to be migrated.

enwiki should be done besides T393237.

This comment was removed by Zabe.
MariaDB [wikidatawiki]> select count(*) from content where content_address  like 'tt%';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (48 min 14.734 sec)

MariaDB [wikidatawiki]>

Mentioned in SAL (#wikimedia-operations) [2025-06-16T08:51:37Z] <zabe> zabe@deploy1003:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php wikidatawiki --delete /home/zabe/text_table_cleanup/wikidatawiki --sleep 0.5 # T183490

Change #1159388 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] wikidatawiki: Increase revision-slots cache back to default

https://gerrit.wikimedia.org/r/1159388

Change #1159388 merged by jenkins-bot:

[operations/mediawiki-config@master] wikidatawiki: Increase revision-slots cache back to default

https://gerrit.wikimedia.org/r/1159388

Mentioned in SAL (#wikimedia-operations) [2025-06-16T09:45:07Z] <zabe@deploy1003> Started scap sync-world: Backport for [[gerrit:1159388|wikidatawiki: Increase revision-slots cache back to default (T183490)]], [[gerrit:1158804|Stop setting $wgPageLinksSchemaMigrationStage (T299947)]]

Mentioned in SAL (#wikimedia-operations) [2025-06-16T09:47:00Z] <zabe@deploy1003> zabe: Backport for [[gerrit:1159388|wikidatawiki: Increase revision-slots cache back to default (T183490)]], [[gerrit:1158804|Stop setting $wgPageLinksSchemaMigrationStage (T299947)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-06-16T09:57:54Z] <zabe@deploy1003> Finished scap sync-world: Backport for [[gerrit:1159388|wikidatawiki: Increase revision-slots cache back to default (T183490)]], [[gerrit:1158804|Stop setting $wgPageLinksSchemaMigrationStage (T299947)]] (duration: 12m 46s)

Change #1159402 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/core@master] Remove $wgRevisionSlotsCacheExpiry

https://gerrit.wikimedia.org/r/1159402

Change #1159552 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] Stop setting wgRevisionSlotsCacheExpiry

https://gerrit.wikimedia.org/r/1159552

Change #1159402 merged by jenkins-bot:

[mediawiki/core@master] Remove $wgRevisionSlotsCacheExpiry

https://gerrit.wikimedia.org/r/1159402

Change #1159552 merged by jenkins-bot:

[operations/mediawiki-config@master] Stop setting wgRevisionSlotsCacheExpiry

https://gerrit.wikimedia.org/r/1159552

Mentioned in SAL (#wikimedia-operations) [2025-06-24T19:47:57Z] <zabe@deploy1003> Started scap sync-world: Backport for [[gerrit:1159552|Stop setting wgRevisionSlotsCacheExpiry (T183490)]]

Mentioned in SAL (#wikimedia-operations) [2025-06-24T19:50:16Z] <zabe@deploy1003> zabe: Backport for [[gerrit:1159552|Stop setting wgRevisionSlotsCacheExpiry (T183490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-06-24T19:59:25Z] <zabe@deploy1003> Finished scap sync-world: Backport for [[gerrit:1159552|Stop setting wgRevisionSlotsCacheExpiry (T183490)]] (duration: 11m 28s)