Page MenuHomePhabricator

Move muswiki and mhwiktionary (closed wikis) from s3 to s5
Closed, ResolvedPublic

Description

In order to be able to start getting new wikis into s5 instead of s3, the creation script uses a dummy wiki, which is currently aawiki .
While currently we can just point to a wiki on s5, let's move two closed wikis and use those instead of live wikis for the new ones creation.

Those two wikis are very small and have no activity, but they do get occasional writes on the logging table:

root@db1075:~# mysql -e "select log_timestamp from muswiki.logging order by log_timestamp desc limit 10"
+----------------+
| log_timestamp  |
+----------------+
| 20200727213147 |
| 20200726202753 |
| 20200726174328 |
| 20200726160835 |
| 20200726065026 |
| 20200725174211 |
| 20200725080019 |
| 20200725023347 |
| 20200724175151 |
| 20200724111807 |
+----------------+
root@db1075:~# mysql -e "select log_timestamp from mhwiktionary.logging order by log_timestamp desc limit 10"
+----------------+
| log_timestamp  |
+----------------+
| 20200727082927 |
| 20200726064815 |
| 20200722114512 |
| 20200711123344 |
| 20200709164755 |
| 20200704235745 |
| 20200703213451 |
| 20200628163858 |
| 20200615224250 |
| 20200610223828 |
+----------------+

They are super small and shouldn't take long to move them:

root@db1075:/srv/sqldata# du -sh muswiki mhwiktionary
36M	muswiki
34M	mhwiktionary

However, we'd need to make sure no one writes to them.
Is that something you can help with @Ladsgroup? We'd need to pick a date on when we can do the database movement and coordinate with whoever generate writes, to make sure nothing gets written.

Once all this is done, we should change the documentation to make sure s3 is no longer referenced: https://wikitech.wikimedia.org/wiki/Add_a_wiki

Event Timeline

Marostegui triaged this task as Medium priority.Jul 28 2020, 7:29 AM
Marostegui moved this task from Triage to Pending comment on the DBA board.

@Ladsgroup @Reedy even if they are closed wikis, do we need to change db-eqiad.php and db-codfw.php and make sure we point them to s5?
s3.dblist would need to be modified, as they do show up there, so we'd need to move them to s5.dblist as well.

@Ladsgroup @Reedy even if they are closed wikis, do we need to change db-eqiad.php and db-codfw.php and make sure we point them to s5?

Yup, that's still needed. If we are migrating lots of wikis (in future) we need to find a better way to do it. Maybe half of the alphabet here, the other half there?

s3.dblist would need to be modified, as they do show up there, so we'd need to move them to s5.dblist as well.

Yup.

Is that something you can help with @Ladsgroup? We'd need to pick a date on when we can do the database movement and coordinate with whoever generate writes, to make sure nothing gets written.

I can communicate it to stewards and global renamers but It'd be great if we also set "wgReadOnly" to true to these wikis, better safe than sorry.

Once all this is done, we should change the documentation to make sure s3 is no longer referenced: https://wikitech.wikimedia.org/wiki/Add_a_wiki

And its maintenance bot: https://github.com/Ladsgroup/Phabricator-maintenance-bot/blob/master/new_wikis_handler.py

Yes, putting the wikis to read only is certainly a good idea, plus informing global renamers and stewards to not do global renames during the window (I'm honestly not sure what it does when one of many wikis is read only).

On IRC I told Amir that on MySQL level, we cannot put specific databases into read_only mode, as it is a global mysql flag, so it is either everything (no-go) or nothing.

So, the first step to do would be to test what happens when wgReadOnly is turned on for one wiki in the beta cluster (how does global renaming behave, account autocreations, querycache updates, preference updates and maybe other things P12071 indicates that might happen at a closed wiki).

So, the first step to do would be to test what happens when wgReadOnly is turned on for one wiki in the beta cluster (how does global renaming behave, account autocreations, querycache updates, preference updates and maybe other things P12071 indicates that might happen at a closed wiki).

+1

Also, let's also think if there's a write that happens while the migration is happening.
The data import shouldn't take more than a few seconds, but there's some race condition between the data being imported into s5 and when we change the config (db-eqiad.php and db-codfw.php)

Change 616759 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] labs: Turn beta cswiki to read only

https://gerrit.wikimedia.org/r/616759

Change 616759 merged by jenkins-bot:
[operations/mediawiki-config@master] labs: Turn beta cswiki to read only

https://gerrit.wikimedia.org/r/616759

Change 616839 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "labs: Turn beta cswiki to read only"

https://gerrit.wikimedia.org/r/616839

Change 616839 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "labs: Turn beta cswiki to read only"

https://gerrit.wikimedia.org/r/616839

I have turned cswiki beta to read only mode via https://www.mediawiki.org/wiki/Manual:$wgReadOnly, and then executed ls -lt /srv/sqldata/cswiki | head to see top ten recently modified tables, to be able to detect any writes. Output is as follows:

root@deployment-db05:/srv/sqldata/cswiki# ls -lt | head
total 163284
-rw-rw---- 1 mysql mysql   131072 Jul 28 14:39 user_groups.ibd
-rw-rw---- 1 mysql mysql   196608 Jul 28 14:21 module_deps.ibd
-rw-rw---- 1 mysql mysql   163840 Jul 28 14:20 echo_notification.ibd
-rw-rw---- 1 mysql mysql   131072 Jul 28 14:20 user_newtalk.ibd
-rw-rw---- 1 mysql mysql   229376 Jul 28 14:20 watchlist.ibd
-rw-rw---- 1 mysql mysql 13631488 Jul 28 05:01 querycache.ibd
-rw-rw---- 1 mysql mysql    98304 Jul 28 05:01 querycache_info.ibd
-rw-rw---- 1 mysql mysql    98304 Jul 28 05:01 site_stats.ibd
-rw-rw---- 1 mysql mysql   212992 Jul 28 00:15 actor.ibd

Then, I tried to make MediaWiki write something. Firstly, I tried to autocreate an account by logging in with an account that doesn't exist locally at cswiki beta. That threw a (nothing-telling) error, and didn't cause a DB write. Then I tried to run updateSpecialPages.php, which updates cache for a few of special pages (querycache table), which threw a fatal error, complaining that the wiki is in its read-only mode, and didn't write anything, see P12089.

Then, I tried to rename ET9 to ET9-test, which did start, but it didn't get past cswiki (ie. the rename became stuck). Special:GlobalRenameProgress/ET9-test looked like:

image.png (712×1 px, 84 KB)

I suspect it ended similary as updateSpecialPages.php I tried earlier, but I cannot verify that, because I cannot find where beta stores logs (deployment-fluorine02 doesn't have exception.log). If this happen while we're moving the wikis, it will be just like a regular stuck rename I believe, which can be un-stucked using the regular process (https://wikitech.wikimedia.org/wiki/Stuck_global_renames). In this case, the rename automatically completed once I unlocked cswiki beta. If that doesn't happen, we can always restart the rename using the workflow I linked.

Then, I tried to update my preferences, but MediaWiki didn't let me to do that, and complained wiki is set to read only. MediaWiki also didn't let me to update my watchlist, edit pages or change user rights.

I managed to successfully send an email via Special:EmailUser, which causes a write to cu_changes in production, but I am not sure if CheckUser would listen to the read only switch. We would need to test this at a testwiki in production later. On the other hand, if this write was ignored (ie. it would be lost in the race condition @Marostegui mentioned), it would not be an issue - checkuser is not used frequently at those wikis, and this situation is also quite unlikely on its own.

[...]
Also, let's also think if there's a write that happens while the migration is happening.
The data import shouldn't take more than a few seconds, but there's some race condition between the data being imported into s5 and when we change the config (db-eqiad.php and db-codfw.php)

Most writes should be cache updates, or "forgetable data", such as cu_changes. In those cases, it's totally fine to ignore them. I think the pretty much only thing I can think of that writes to closed wikis (EDIT: and is critical to keep) is global renames, which waits until the wiki is unlocked.

Wow, thanks for the tests @Urbanecm, very complete and very detailed info. This is very helpful.
I think the next step would be to try to change db-codfw.php on a mwdebug2XXX host and try to browse the site, to make sure nothing obvious breaks?

Change 617152 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Set muswiki to reqd only

https://gerrit.wikimedia.org/r/617152

Change 617152 merged by jenkins-bot:
[operations/mediawiki-config@master] Set muswiki to read only

https://gerrit.wikimedia.org/r/617152

Mentioned in SAL (#wikimedia-operations) [2020-07-29T15:24:06Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 617152: Set muswiki to read only | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617152 (T259004) (duration: 01m 08s)

Change 617167 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "Set muswiki to read only"

https://gerrit.wikimedia.org/r/617167

Change 617167 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Set muswiki to read only"

https://gerrit.wikimedia.org/r/617167

I have repeated my beta tests from T259004#6341365, and I confirm production doesn't do any database writes in cases I tested for beta. Furthermore, I tested what production does when an email is sent (ie. does it write to cu_changes). The answer is no, the table is left intact, and no writes happened after sending an email.

Mentioned in SAL (#wikimedia-operations) [2020-07-29T15:48:43Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 617167: Revert "Set muswiki to read only" | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617167 (T259004) (duration: 01m 06s)

@Marostegui I believe all necessary tests were done. If there is anything I forgot to test and should test, let me know.

See in logstash on 2020-07-29:

[{exception_id}] {exception_url} Wikimedia\DependencyStore\DependencyStoreException from line 113 of /srv/mediawiki/php-1.36.0-wmf.2/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Database is read-only: T259004 testing

#0 /srv/mediawiki/php-1.36.0-wmf.2/includes/resourceloader/ResourceLoader.php(614): Wikimedia\DependencyStore\SqlModuleDependencyStore->storeMulti(string, array, integer)
#1 /srv/mediawiki/php-1.36.0-wmf.2/includes/deferred/MWCallableUpdate.php(38): ResourceLoader->{closure}()
#2 /srv/mediawiki/php-1.36.0-wmf.2/includes/deferred/DeferredUpdates.php(467): MWCallableUpdate->doUpdate()
#3 /srv/mediawiki/php-1.36.0-wmf.2/includes/deferred/DeferredUpdates.php(344): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#4 /srv/mediawiki/php-1.36.0-wmf.2/includes/deferred/DeferredUpdates.php(278): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#5 /srv/mediawiki/php-1.36.0-wmf.2/includes/deferred/DeferredUpdates.php(194): DeferredUpdates::handleUpdateQueue(array, string, integer)
#6 /srv/mediawiki/php-1.36.0-wmf.2/includes/MediaWiki.php(1113): DeferredUpdates::doUpdates(string)
#7 /srv/mediawiki/php-1.36.0-wmf.2/includes/MediaWiki.php(849): MediaWiki->restInPeace()
#8 /srv/mediawiki/php-1.36.0-wmf.2/includes/MediaWiki.php(861): MediaWiki->{closure}()
#9 /srv/mediawiki/php-1.36.0-wmf.2/load.php(56): MediaWiki->doPostOutputShutdown()
#10 /srv/mediawiki/php-1.36.0-wmf.2/load.php(38): wfLoadMain()
#11 /srv/mediawiki/w/load.php(3): require(string)
#12 {main}

@BPirkle Hello, yes, that is an expected error. We were testing if does wgReadOnly prevent all writes, or just some, to help preparation in moving two wikis from s3 to s5.

Thank you again @Urbanecm for all the testing.
With all those things tested, I think the procedure could be something like:

  • Downtime codfw s5 hosts
  • Set muswiki and mhwiktionary into read-only using this as example: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617152
  • Deploy with scap
  • Export muswiki and mhwiktionary from s3 (it takes around: 1 second)
  • Import that sql file into s5 codfw with replication (it takes around 3 seconds).
  • Import that sql file on all eqiad hosts with replication disabled (should take around 3-5 seconds per host)
  • Sanitize and install triggers on those wikis on db1124:3315 db2094:3315
  • Do a recursive count for revision and user for muswiki and mhwiktionary tables across s5 to make sure they are all the same
  • Change ./wmf-config/config/muswiki.yaml and ./wmf-config/config/mhwiktionary.yaml and then run composer buildDBLists
  • Change db-eqiad.php and db-codfw.php to point muswiki and mhwiktionary to s5.
  • Merge and deploy scap
  • Revert RO for muswiki and mhwiktionary
  • Merge and deploy scap

@jcrespo @Kormat @Ladsgroup @Urbanecm I would like to get a review of the above procedure to check if I have missed something.
Also, I would like to have either @Ladsgroup or @Urbanecm around for the day we decide to do this maintenance, if possible to take care of the MW side of things.

I'm wondering about removing the databases from s3 to ensure that any testing is hitting the right section. Also, are there MW caches that need to be cleared?

I'm wondering about removing the databases from s3 to ensure that any testing is hitting the right section.

Yeah, the clean up part would come next, unfortunately deleting wikis isn't super easy and needs to be done carefully, so I will create for that specifically as a subtask.

Also, are there MW caches that need to be cleared?

Probably not, these are tiny closed wikis

The cited procedure seems good to me MediaWiki-side, through the canonical source of dblist files is muswiki.yaml/mhwiktionary.yaml respectively.

I'm wondering about removing the databases from s3 to ensure that any testing is hitting the right section.

We can run wfGetDB(DB_REPLICA)->getServer() at an arbitrary MediaWiki apphost to see which server it is currently talking to.

Also, are there MW caches that need to be cleared?

Once a change is synced, it should be immediately effective - scap should care about opcache.

Also, I would like to have either @Ladsgroup or @Urbanecm around for the day we decide to do this maintenance, if possible to take care of the MW side of things.

Sure, happy to be around.

The cited procedure seems good to me MediaWiki-side, through the canonical source of dblist files is muswiki.yaml/mhwiktionary.yaml respectively.

Cool, so no need to change s3 and s5.dblist files and just ./wmf-config/config/muswiki.yaml ?

The cited procedure seems good to me MediaWiki-side, through the canonical source of dblist files is muswiki.yaml/mhwiktionary.yaml respectively.

Cool, so no need to change s3 and s5.dblist files and just ./wmf-config/config/muswiki.yaml ?

Change the yaml file, and then run composer buildDBLists, which will update the dblist files as appropriate.

The cited procedure seems good to me MediaWiki-side, through the canonical source of dblist files is muswiki.yaml/mhwiktionary.yaml respectively.

Cool, so no need to change s3 and s5.dblist files and just ./wmf-config/config/muswiki.yaml ?

Change the yaml file, and then run composer buildDBLists, which will update the dblist files as appropriate.

Thanks, I am going to edit the procedure to reflect this.
Thank you!

You can just edit all three manually... and then CI will complain if you've not done it correctly ;P

@Marostegui Hello, please note my availability will be limited during August 4-6. Since your vacation ends on Aug 3, I guess we should do this in the week of Aug 9 or later.

@Marostegui Hello, please note my availability will be limited during August 4-6. Since your vacation ends on Aug 3, I guess we should do this in the week of Aug 9 or later.

Hey, thanks for the heads up.
Do you want to schedule it for Tuesday 11th at 08:00 AM UTC?

Excellent, thank you. I will block that Window on the deployments page

@Urbanecm would you have time to prepare MW patches before the maintenance day so they can be reviewed by @Ladsgroup and myself?
Thanks for your support!

Sure! So, as a mental recap, we will need a readonly patch for both muswiki and mhwiktionary, a change of db-eqiad/db-codfw files, and a revert of the readonly patch. Going to upload them

Change 618089 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Turn muswiki and mhwiktionary to read-only

https://gerrit.wikimedia.org/r/618089

Change 618090 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Point muswiki and mhwiktionary to s5

https://gerrit.wikimedia.org/r/618090

Change 618091 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "Turn muswiki and mhwiktionary to read-only"

https://gerrit.wikimedia.org/r/618091

And also de yaml that will regenerate the new s5 dblist

And also de yaml that will regenerate the new s5 dblist

Updated the moving patch.

Change 618091 abandoned by Urbanecm:
[operations/mediawiki-config@master] Revert "Turn muswiki and mhwiktionary to read-only"

Reason:
will upload second one, rebasing

https://gerrit.wikimedia.org/r/618091

Change 618089 merged by jenkins-bot:
[operations/mediawiki-config@master] Turn muswiki and mhwiktionary to read-only

https://gerrit.wikimedia.org/r/618089

Mentioned in SAL (#wikimedia-operations) [2020-08-11T08:06:25Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: a04bc1f27e4ef4e38002d546d30bfd2d1dc60d0e: Turn muswiki and mhwiktionary to read-only (T259004) (duration: 01m 01s)

Change 618090 merged by jenkins-bot:
[operations/mediawiki-config@master] Point muswiki and mhwiktionary to s5

https://gerrit.wikimedia.org/r/618090

Change 619369 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "Turn muswiki and mhwiktionary to read-only"

https://gerrit.wikimedia.org/r/619369

Change 619369 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Turn muswiki and mhwiktionary to read-only"

https://gerrit.wikimedia.org/r/619369

Mentioned in SAL (#wikimedia-operations) [2020-08-11T08:43:34Z] <urbanecm@deploy1001> Synchronized wmf-config/db-codfw.php: 81f4594b6c583f938821549b3a1800fec5b120bb: Point muswiki and mhwiktionary to s5 (T259004; 1/3) (duration: 01m 02s)

Mentioned in SAL (#wikimedia-operations) [2020-08-11T08:44:48Z] <urbanecm@deploy1001> Synchronized wmf-config/db-eqiad.php: 81f4594b6c583f938821549b3a1800fec5b120bb: Point muswiki and mhwiktionary to s5 (T259004; 2/3) (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2020-08-11T08:45:58Z] <urbanecm@deploy1001> Synchronized dblists/: 81f4594b6c583f938821549b3a1800fec5b120bb: Point muswiki and mhwiktionary to s5 (T259004; 3/3) (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2020-08-11T08:52:29Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: e6ec237b6b6fb67a0a80613909589bc724f5eecf: Revert "Turn muswiki and mhwiktionary to read-only" (T259004) (duration: 00m 58s)

This was successfully done.
Big thanks to @Urbanecm for all the testing and driving this from MW side! Thanks also @Ladsgroup for all the help, reviews and support!
Much appreciated guys

We need to follow up the documentation and process changes at T259438 and a clean up task to get the tables removed from s3.