Page MenuHomePhabricator

[wikireplicas] Update Admin docs
Closed, ResolvedPublic

Description

https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas contains a lot of information, but there are some TODOs, and some outdated sections (e.g. "Who admins what").

We could also consider adding more diagrams and splitting the page into multiple sub-pages. One sub-page could be the existing https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replica_DNS

Event Timeline

fnegri changed the task status from Open to In Progress.May 23 2024, 2:23 PM
fnegri triaged this task as Medium priority.

I reviewed and updated all the admin docs related to Wiki Replicas. They're all under Category:Wiki_Replica_admin.

@ABran-WMF I would appreciate if you could do a quick review of all the docs in the category and let me know if you find anything that is confusing or obviously wrong.

@ABran-WMF I would appreciate if you could do a quick review of all the docs in the category and let me know if you find anything that is confusing or obviously wrong.

thanks for asking @fnegri, gladly!

I've found that part that seems a bit old: would you be interested in migrating that part in a cookbook? spicerack has all the relevant logic.

Nothing else caught my eye :-)

@ABran-WMF thank you for reviewing!

I've found that part that seems a bit old

What exactly is incorrect? I see that the db names db1065 and db1106 do not exist any more, but the procedure looks similar to what is documented in MariaDB#Manipulating_the_Replication_Tree. Maybe we could simply point to that page.

would you be interested in migrating that part in a cookbook? spicerack has all the relevant logic.

That's a good idea, I can open a task to track it. I wonder if Sanitariums have anything special in that regard, or if it could be a generic cookbook for any dbXXXX?

@ABran-WMF thank you for reviewing!

I've found that part that seems a bit old

What exactly is incorrect?

Nothing! But given what's required, I'm sure that it could fit in a cookbook

That's a good idea, I can open a task to track it.

that would be appreciated! thanks! please tag DBA and Data-Persistence-Automations if you do :-)

I wonder if Sanitariums have anything special in that regard, or if it could be a generic cookbook for any dbXXXX?

I think any difference should be trivial, and if we stumble upon one that is not, it maybe should be something we try to address and make it more trivial (to avoid snowflakes and all those things). wdyt?

the procedure looks similar to what is documented in MariaDB#Manipulating_the_Replication_Tree

Re-reading this page, it looks like repl.pl is no longer used, because since we now use GTID replication, all you have to do is CHANGE MASTER TO MASTER_HOST. I have hidden the old procedure in a collapsed section and updated the Sanitarium page with an example that follows the new procedure.

I have also created T374603 to track the creation of a cookbook.

I wouldn't be responsible if I didn't tell you that GTID has been very error prone to us, and that is has been very unreliable, and why I believe it is not used in production at the moment. GTID works well when it works well, and terrible when it doesn't. The only reason GTID is enabled in production is the innodb safe replication tracking on crash.

@jcrespo thanks, does it mean that repl.pl is still used? Are the docs at MariaDB#Manipulating_the_Replication_Tree up to date?

I think we use a script in wmfmariadbpy called move_replica

This method is used: https://wikitech.wikimedia.org/wiki/Primary_database_switchover but it only works when switching working replication and with direct parent-child relationships, so it cannot apply to wikireplicas -and it may not ever be, due to skipped/modified/additional transactions due to filtering (which is why it is so hard to handle them)

@jcrespo I have added your comment above to MariaDB#Manipulating_the_Replication_Tree.

Would the method in https://wikitech.wikimedia.org/wiki/Primary_database_switchover work for changing the source of replication of a Sanitarium host, e.g. moving the MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? I don't want to do it right now, but I'm trying to understand what the procedure would be in case db1196 has an issue, and update the example at Sanitarium_and_clouddb_instances#Sanitarium's_primary_failover.

@jcrespo I have added your comment above to MariaDB#Manipulating_the_Replication_Tree.

Would the method in https://wikitech.wikimedia.org/wiki/Primary_database_switchover work for changing the source of replication of a Sanitarium host, e.g. moving the MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? I don't want to do it right now, but I'm trying to understand what the procedure would be in case db1196 has an issue, and update the example at Sanitarium_and_clouddb_instances#Sanitarium's_primary_failover.

I answered already on my previous comment to your question. No.

GTID would work in theory. I doubt it would work in practice. Discussion here: T196366#5420946 (the ticket is not the same scope, but the discussion it relevant).

I wasn't sure if "cannot apply to wikireplicas" included Sanitariums or only clouddbs. If it also includes Sanitariums, what would be your recommended procedure for my example above -- moving MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? Did you ever change the MASTER_HOST for a Sanitarium in the past?

I wasn't sure if "cannot apply to wikireplicas" included Sanitariums or only clouddbs. If it also includes Sanitariums, what would be your recommended procedure for my example above -- moving MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? Did you ever change the MASTER_HOST for a Sanitarium in the past?

Unsure. / No.

Thanks, I've updated again MariaDB#Manipulating_the_Replication_Tree and MariaDB/Sanitarium_and_clouddb_instances#Sanitarium's_primary_failover.

I hope they now reflect better the current situation. Feel free to edit those pages if you want to add further information.