Page MenuHomePhabricator

Move labtestwikitech database to clouddb2001-dev
Closed, ResolvedPublic

Description

I tried to move this onto a prod database a while ago but there were various issues (including that the prod db is read-only in codfw most of the time.) Let's move this back to a wmcs-managed host.

Related Objects

Event Timeline

Andrew added a subscriber: Joe.

The system for assigning a particular wiki to a particular db host in mediawiki-config has changed a lot since I last touched this code. @Joe, if you could write me a sample patch of how to break out labtestwiki into its own group and direct it to a different db server, I should be able to take it from there.

Moving the database itself (or building a fresh one) is straightforward and something I can do myself.

Thanks!

Joe removed Joe as the assignee of this task.EditedOct 15 2019, 3:37 PM
Joe added a subscriber: CDanis.

The system for assigning a particular wiki to a particular db host in mediawiki-config has changed a lot since I last touched this code. @Joe, if you could write me a sample patch of how to break out labtestwiki into its own group and direct it to a different db server, I should be able to take it from there.

Moving the database itself (or building a fresh one) is straightforward and something I can do myself.

Thanks!

Hi! I'm not sure what you need to do, but I assume you're trying to set up a new database instance for the section wikitech, correct?

If that is the case, that is done via a specialized conftool cli script called dbctl.

The documentation is here:
https://wikitech.wikimedia.org/wiki/Dbctl

and for you specifically of interest is probably:

https://wikitech.wikimedia.org/wiki/Dbctl#Add_a_new_host_(ie:_a_new_provisioned_host)_to_a_section

If you have more doubts, maybe @CDanis can help you as well.

Hey @Andrew can you specify a bit more what's your plan with the new database on the codfw instance?
Right now the labstestwiki is located at eqiad and codfw (on a RO host as you know). Are you planning to have a writable version on codfw? What's the plan then with the existing one on codfw? And moreover, how are you going to replicate those changes to the eqiad one?
Are those two DBs going to become independent DBs? If so, how are you planning to have DC redundancy?

The big picture is: labtestwikitech needs a database. That database needs to be read/write.

labtestwikitech currently /has/ a database, in the m5 cluster, but it's not read/write from codfw and hence largely useless.

I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs. Pointing a given wiki to an arbitrary database used to be simple (and I've done it several times) but since I last visited the wmf-config code things have become very different and abstract so I no longer no how to tell it "labtestwikitech uses database named 'foo' on server 'bar'." The answer to that last question is literally all I need here, although of course any other help is welcome.

Hi Andrew!

I'm very sorry for your difficulties :( I've tried to keep things as simple as I knew how, to provide at least a minimal amount and quality of documentation, and to link to said documentation liberally in the config files. There's definitely more work to be done there, on all fronts, and I freely admit to not having done the cleanest job of all of this; please do let me know if you have any suggestions. (Maybe, as a start, the dbctl wikitech page itself needs some more background context...?)

To add some context: nowadays, the following three arrays that used to live in db-eqiad.php/db-codfw.php now live in etcd: groupLoadsBySection, readOnlyBySection, and sectionLoads. Mediawiki's config code now reads all the pieces that are left in the aforementioned usual files, then merges in those particular sub-arrays of $wgLBFactoryConf. The etcd data are only ever written to by a dbctl tool which runs on the cumin hosts, and generates those values from underlying data about each database server.

The tradeoff for the added complexity of this is twofold: 1) operations like master failovers are much quicker (the most recent one had only 26 seconds of read-only time for users) and involve much less toil for DBAs, and 2) there are some sanity-checking safeguards implemented in dbctl, checked before prod Mediawiki can see a bad config.

Other parts of the wiki-to-DB mapping configuration, like dblists, haven't changed.

If you wanted to use the dbctl-ified infrastructure, I think you'd need to add the server to the hostsByName entry in db-codfw.php, then create a new section 'labtestwikitest' in dbctl, then create an instance object that is pooled for that section, and then finally, edit the dblist config files as you would have done previously to refer to that section for that wiki name. I can help with that if you'd like.

Another option, if you'd rather not bother with all that, would be to just statically special-case that section in the configuration files -- I think that it would be fine to statically set $wgLBFactoryConf['sectionLoads']['labwikitestwiki'] = ['clouddb-foo1234'=>1]; in the config, after wmfEtcdApplyDBConfig() has been called.

The big picture is: labtestwikitech needs a database. That database needs to be read/write.

labtestwikitech currently /has/ a database, in the m5 cluster, but it's not read/write from codfw and hence largely useless.

I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs. Pointing a given wiki to an arbitrary database used to be simple (and I've done it several times) but since I last visited the wmf-config code things have become very different and abstract so I no longer no how to tell it "labtestwikitech uses database named 'foo' on server 'bar'." The answer to that last question is literally all I need here, although of course any other help is welcome.

I understand your needs, but what you are trying to do leave lots of fundamental questions floating around and I do think we need to address them before moving forward.
As I asked at T233236#5575780 there are many things that would need an answer or some sort of acknowledgment that they can potentially cause issues in the future - ie: no DC redundancy.

By having a writable database in codfw, you are effectively having a split brain with the labtestwiki that is being written in eqiad, and you'd no longer be able to have cross dc replication (unless you plan to also have a new host replicating that database into an RO eqiad database for redundancy).
However, that would be confusing, having 2 pieces of infrastructure, with the same database but just being written on two different datacenters and replicating cross DC, but that is your decision.

I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs

I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.
If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).

By having a writable database in codfw, you are effectively having a split brain with the labtestwiki that is being written in eqiad, and you'd no longer be able to have cross dc replication (unless you plan to also have a new host replicating that database into an RO eqiad database for redundancy).

The only place that labtestwiki runs is in codfw. There is no MediaWiki deployment in eqiad that is connected to the labtestwiki database. The closest analog of labtestwiki for the main Wikimedia cluster is the wikifarm inside of the deployment-prep Cloud VPS project. Similarly labwiki only exists in eqiad with no counterpart in codfw. When a DC switch is performed, wikitech remains running from the labweb* hosts in eqiad.

I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.

labtestwiki is currently and will always be a snowflake. The database and associated MediaWiki deployment exist as part of what is now called the 'codfw1dev' deployment of Cloud Services. This is the staging and testing environment that we have for our OpenStack cluster and associated services. labtestwiki is a needed part of this environment for testing how various OpenStack changes will effect wikitech before we actually break the live wikitech. If we had the wiki functional during the testing of our recent OpenStack version upgrade we may have been able to avoid the issues from T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.

If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).

That is exactly the intent of this ticket. The schema migrations will be a bit of a pain to manage, but in past tickets where we asked if there was any way to have a read-write database in codfw we felt that the answer was no.

for what it's worth... I would prefer for the DBAs to manage this database rather than managing it myself -- that was my intent when I moved it to m5 in the first place. But as I understand it it's not currently possible to have a DBA-managed database that's writeable in codfw.

By having a writable database in codfw, you are effectively having a split brain with the labtestwiki that is being written in eqiad, and you'd no longer be able to have cross dc replication (unless you plan to also have a new host replicating that database into an RO eqiad database for redundancy).

The only place that labtestwiki runs is in codfw. There is no MediaWiki deployment in eqiad that is connected to the labtestwiki database. The closest analog of labtestwiki for the main Wikimedia cluster is the wikifarm inside of the deployment-prep Cloud VPS project. Similarly labwiki only exists in eqiad with no counterpart in codfw. When a DC switch is performed, wikitech remains running from the labweb* hosts in eqiad.

We have labtestwiki and labswiki are on m5 (and it is writable) so I treat them like any other database on that regard.

I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.

labtestwiki is currently and will always be a snowflake. The database and associated MediaWiki deployment exist as part of what is now called the 'codfw1dev' deployment of Cloud Services. This is the staging and testing environment that we have for our OpenStack cluster and associated services. labtestwiki is a needed part of this environment for testing how various OpenStack changes will effect wikitech before we actually break the live wikitech. If we had the wiki functional during the testing of our recent OpenStack version upgrade we may have been able to avoid the issues from T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.

I do understand the requirements, I am just debating what's the best approach to make this work for everyone :-)

If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).

That is exactly the intent of this ticket. The schema migrations will be a bit of a pain to manage, but in past tickets where we asked if there was any way to have a read-write database in codfw we felt that the answer was no.

The answer is "no" if handled by us, because that is a total snowflake in our infra as at the moment we only have writable databases on eqiad that are replicated to codfw, doesn't matter whether they are MW databases or misc databases.
Having a writable database in codfw breaks our consistency because we'd have to manage 2 databases, with the same name, on different servers, being written with different data depending on the DC, and again (this hasn't been answered yet) I assume you want to have a copy of those databases on the opposite DC.

As I have said earlier, if you prefer taking labtestwiki out from m5 into your own set of servers and owning it, I am fine with that.

for what it's worth... I would prefer for the DBAs to manage this database rather than managing it myself -- that was my intent when I moved it to m5 in the first place. But as I understand it it's not currently possible to have a DBA-managed database that's writeable in codfw.

I wouldn't feel comfortable managing that indeed. That is a snowflake, we manage lots of things and having such a small - but at the same time big - difference will most likely will end up with us making a human error.
So if you prefer to have two copies of labtestwiki one per DC and both being writable I would prefer if you own the whole infra for that (and of course, I can help with the migration).

The answer is "no" if handled by us, because that is a total snowflake in our infra as at the moment we only have writable databases on eqiad that are replicated to codfw, doesn't matter whether they are MW databases or misc databases.
Having a writable database in codfw breaks our consistency because we'd have to manage 2 databases, with the same name, on different servers, being written with different data depending on the DC, and again (this hasn't been answered yet) I assume you want to have a copy of those databases on the opposite DC.

We have no need at all for a copy of the labtestwiki database in eqiad, only codfw. As a testing only wiki with no active community we can get by with periodic dumps of the database for recovery.

For labswiki, some replica somewhere for disaster recovery of the primary in eqiad is necessary, but we have no plans to establish a MediaWiki deployment in codfw for wikitech. https://wikitech-static.wikimedia.org/wiki/Main_Page is the read-only "fallback" for wikitech in the event of a loss of ability to use the eqiad hosted wiki.

T161859: Make Wikitech an SUL wiki will change all of the database concerns for wikitech and eliminate the need for labtestwiki in codfw, but we do not currently have any estimate of when major blocker of T196171: Developer account creation without OpenStackManager will be resolved--neither the WMCS nor SRE teams currently have that work on their near term roadmaps.

The answer is "no" if handled by us, because that is a total snowflake in our infra as at the moment we only have writable databases on eqiad that are replicated to codfw, doesn't matter whether they are MW databases or misc databases.
Having a writable database in codfw breaks our consistency because we'd have to manage 2 databases, with the same name, on different servers, being written with different data depending on the DC, and again (this hasn't been answered yet) I assume you want to have a copy of those databases on the opposite DC.

We have no need at all for a copy of the labtestwiki database in eqiad, only codfw. As a testing only wiki with no active community we can get by with periodic dumps of the database for recovery.

As I said, I am fine with this, but I would prefer if WMCS owns the whole thing that is: labtestwiki setup on eqiad on a new WMCS owned host and its codfw sibling rather than DBAs owning labtestwiki on eqiad just because it is part of m5 and WMCS owning labtestwiki on codfw just because a RW database is needed there.

For labswiki, some replica somewhere for disaster recovery of the primary in eqiad is necessary, but we have no plans to establish a MediaWiki deployment in codfw for wikitech. https://wikitech-static.wikimedia.org/wiki/Main_Page is the read-only "fallback" for wikitech in the event of a loss of ability to use the eqiad hosted wiki.

T161859: Make Wikitech an SUL wiki will change all of the database concerns for wikitech and eliminate the need for labtestwiki in codfw, but we do not currently have any estimate of when major blocker of T196171: Developer account creation without OpenStackManager will be resolved--neither the WMCS nor SRE teams currently have that work on their near term roadmaps.

Yeah, labswiki is a lot more complex and I am leaving it aside from this equation.

As I said, I am fine with this, but I would prefer if WMCS owns the whole thing that is: labtestwiki setup on eqiad on a new WMCS owned host and its codfw sibling rather than DBAs owning labtestwiki on eqiad just because it is part of m5 and WMCS owning labtestwiki on codfw just because a RW database is needed there.

Agreed. Once we have the new database setup on clouddb2001-dev and labtestwikitech pointed at it we will have no need for the copy of labtestwiki on m5 and it should be dropped to avoid confusing everyone.

Ok, and I understand that if we ever need that one on eqiad it will be equally owned by WMCS then?

Once we are ready to drop it from m5 please create a ticket with us so we can also clean the grants too.

Ok, and I understand that if we ever need that one on eqiad it will be equally owned by WMCS then?

Agreed.

Change 543664 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] labtestwiki: move to a wmcs-hosted database on clouddb2001-dev

https://gerrit.wikimedia.org/r/543664

The db is now working on clouddb2001-dev. I'm attaching a patch about grants -- I'll leave this to a DBA to explicitly remove grants and drop the database from m5.

Change 543955 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] m5 grants: remove grants for 'labtestwiki' database

https://gerrit.wikimedia.org/r/543955

The db is now working on clouddb2001-dev. I'm attaching a patch about grants -- I'll leave this to a DBA to explicitly remove grants and drop the database from m5.

Thanks!
I am going to first rename the tables on the database and leave them renamed for a few days before dropping the database and the grants.

Mentioned in SAL (#wikimedia-operations) [2019-10-18T05:19:10Z] <marostegui> Rename m5 labtestwiki database - T233236

Tables renamed:

root@cumin1001:/home/marostegui# mysql.py -hdb1133 labtestwiki -e "show tables" -BN
T233236_abuse_filter
T233236_abuse_filter_action
T233236_abuse_filter_history
T233236_abuse_filter_log
T233236_accountaudit_login
T233236_actor
T233236_archive
T233236_babel
T233236_betafeatures_user_counts
T233236_category
T233236_categorylinks
T233236_change_tag
T233236_change_tag_def
T233236_comment
T233236_content
T233236_content_models
T233236_cu_changes
T233236_cu_log
T233236_echo_email_batch
T233236_echo_event
T233236_echo_notification
T233236_echo_target_page
T233236_externallinks
T233236_filearchive
T233236_geo_tags
T233236_global_block_whitelist
T233236_globalblocks
T233236_image
T233236_imagelinks
T233236_interwiki
T233236_ip_changes
T233236_ipblocks
T233236_ipblocks_restrictions
T233236_iwlinks
T233236_job
T233236_l10n_cache
T233236_langlinks
T233236_ldap_domains
T233236_linter
T233236_log_search
T233236_logging
T233236_mathoid
T233236_module_deps
T233236_oathauth_users
T233236_oauth_accepted_consumer
T233236_oauth_registered_consumer
T233236_objectcache
T233236_oldimage
T233236_openstack_notification_event
T233236_openstack_puppet_classes
T233236_openstack_puppet_groups
T233236_openstack_puppet_vars
T233236_openstack_tokens
T233236_page
T233236_page_props
T233236_page_restrictions
T233236_pagelinks
T233236_protected_titles
T233236_querycache
T233236_querycache_info
T233236_querycachetwo
T233236_recentchanges
T233236_redirect
T233236_revision
T233236_revision_actor_temp
T233236_revision_comment_temp
T233236_searchindex
T233236_securepoll_cookie_match
T233236_securepoll_elections
T233236_securepoll_entity
T233236_securepoll_lists
T233236_securepoll_msgs
T233236_securepoll_options
T233236_securepoll_properties
T233236_securepoll_questions
T233236_securepoll_strike
T233236_securepoll_voters
T233236_securepoll_votes
T233236_site_identifiers
T233236_site_stats
T233236_sites
T233236_slot_roles
T233236_slots
T233236_spoofuser
T233236_templatelinks
T233236_text
T233236_transcache
T233236_transcode
T233236_updatelog
T233236_updates
T233236_uploadstash
T233236_user
T233236_user_former_groups
T233236_user_groups
T233236_user_newtalk
T233236_user_properties
T233236_watchlist
T233236_wikilove_log

Change 544098 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] labtestwikitech.pp: Specify the new location for labtestwiki

https://gerrit.wikimedia.org/r/544098

Change 544098 merged by Marostegui:
[operations/puppet@production] labtestwikitech.pp: Specify the new location for labtestwiki

https://gerrit.wikimedia.org/r/544098

Change 543664 merged by jenkins-bot:
[operations/mediawiki-config@master] labtestwiki: move to a wmcs-hosted database on clouddb2001-dev

https://gerrit.wikimedia.org/r/543664

Change 543955 merged by Marostegui:
[operations/puppet@production] m5 grants: remove grants for 'labtestwiki' database

https://gerrit.wikimedia.org/r/543955

Andrew claimed this task.

Change 547596 had a related patch set uploaded (by CDanis; owner: Jforrester):
[operations/mediawiki-config@master] Split out DB-related concerns for real and test wikitechs into s10/s11

https://gerrit.wikimedia.org/r/547596

Change 550889 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] dbctl schemata changes for labswiki migration

https://gerrit.wikimedia.org/r/550889

Change 547597 had a related patch set uploaded (by Andrew Bogott; owner: Jforrester):
[operations/mediawiki-config@master] Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config

https://gerrit.wikimedia.org/r/547597

Change 550889 merged by CDanis:
[operations/puppet@production] dbctl schemata changes for labswiki migration

https://gerrit.wikimedia.org/r/550889

Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:17:33Z] <cdanis@cumin2001> dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json

Change 547596 merged by jenkins-bot:
[operations/mediawiki-config@master] Split out DB-related concerns for real and test wikitechs into s10/s11

https://gerrit.wikimedia.org/r/547596

Change 547597 merged by jenkins-bot:
[operations/mediawiki-config@master] Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config

https://gerrit.wikimedia.org/r/547597

Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:35:54Z] <catrope@deploy1001> Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:37:42Z] <catrope@deploy1001> Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)

Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:49:02Z] <catrope@deploy1001> Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)

Andrew reassigned this task from Andrew to CDanis.

Mentioned in SAL (#wikimedia-operations) [2019-11-14T20:06:50Z] <cdanis@cumin2001> dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json

Change 550946 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] dbctl: remove now-obsolete 'wikitech' section

https://gerrit.wikimedia.org/r/550946

Change 550946 merged by CDanis:
[operations/puppet@production] dbctl: remove now-obsolete 'wikitech' section

https://gerrit.wikimedia.org/r/550946

Change 550958 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: rename 'wikitech' to 's10' to match prod

https://gerrit.wikimedia.org/r/550958

Change 550958 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: rename 'wikitech' to 's10' to match prod

https://gerrit.wikimedia.org/r/550958