I tried to move this onto a prod database a while ago but there were various issues (including that the prod db is read-only in codfw most of the time.) Let's move this back to a wmcs-managed host.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aborrero | T217891 CloudVPS: rework codfw deployments | |||
Resolved | None | T229441 CloudVPS: codfw1dev: missing bits | |||
Resolved | PRODUCTION ERROR | Andrew | T201082 labtestweb2001 is sending updates to a read-only db host: db2037 | ||
Resolved | CDanis | T233236 Move labtestwikitech database to clouddb2001-dev | |||
Resolved | Marostegui | T236010 Drop labtestwikitech database from m5 | |||
Resolved | Andrew | T236145 processEchoEmailBatch.php failing for labtestwiki | |||
Resolved | aborrero | T237971 Cron <www-data@cloudweb2001-dev> /usr/local/bin/mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=labswiki --force > /dev/null |
Event Timeline
The system for assigning a particular wiki to a particular db host in mediawiki-config has changed a lot since I last touched this code. @Joe, if you could write me a sample patch of how to break out labtestwiki into its own group and direct it to a different db server, I should be able to take it from there.
Moving the database itself (or building a fresh one) is straightforward and something I can do myself.
Thanks!
Hi! I'm not sure what you need to do, but I assume you're trying to set up a new database instance for the section wikitech, correct?
If that is the case, that is done via a specialized conftool cli script called dbctl.
The documentation is here:
https://wikitech.wikimedia.org/wiki/Dbctl
and for you specifically of interest is probably:
https://wikitech.wikimedia.org/wiki/Dbctl#Add_a_new_host_(ie:_a_new_provisioned_host)_to_a_section
If you have more doubts, maybe @CDanis can help you as well.
Hey @Andrew can you specify a bit more what's your plan with the new database on the codfw instance?
Right now the labstestwiki is located at eqiad and codfw (on a RO host as you know). Are you planning to have a writable version on codfw? What's the plan then with the existing one on codfw? And moreover, how are you going to replicate those changes to the eqiad one?
Are those two DBs going to become independent DBs? If so, how are you planning to have DC redundancy?
The big picture is: labtestwikitech needs a database. That database needs to be read/write.
labtestwikitech currently /has/ a database, in the m5 cluster, but it's not read/write from codfw and hence largely useless.
I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs. Pointing a given wiki to an arbitrary database used to be simple (and I've done it several times) but since I last visited the wmf-config code things have become very different and abstract so I no longer no how to tell it "labtestwikitech uses database named 'foo' on server 'bar'." The answer to that last question is literally all I need here, although of course any other help is welcome.
Hi Andrew!
I'm very sorry for your difficulties :( I've tried to keep things as simple as I knew how, to provide at least a minimal amount and quality of documentation, and to link to said documentation liberally in the config files. There's definitely more work to be done there, on all fronts, and I freely admit to not having done the cleanest job of all of this; please do let me know if you have any suggestions. (Maybe, as a start, the dbctl wikitech page itself needs some more background context...?)
To add some context: nowadays, the following three arrays that used to live in db-eqiad.php/db-codfw.php now live in etcd: groupLoadsBySection, readOnlyBySection, and sectionLoads. Mediawiki's config code now reads all the pieces that are left in the aforementioned usual files, then merges in those particular sub-arrays of $wgLBFactoryConf. The etcd data are only ever written to by a dbctl tool which runs on the cumin hosts, and generates those values from underlying data about each database server.
The tradeoff for the added complexity of this is twofold: 1) operations like master failovers are much quicker (the most recent one had only 26 seconds of read-only time for users) and involve much less toil for DBAs, and 2) there are some sanity-checking safeguards implemented in dbctl, checked before prod Mediawiki can see a bad config.
Other parts of the wiki-to-DB mapping configuration, like dblists, haven't changed.
If you wanted to use the dbctl-ified infrastructure, I think you'd need to add the server to the hostsByName entry in db-codfw.php, then create a new section 'labtestwikitest' in dbctl, then create an instance object that is pooled for that section, and then finally, edit the dblist config files as you would have done previously to refer to that section for that wiki name. I can help with that if you'd like.
Another option, if you'd rather not bother with all that, would be to just statically special-case that section in the configuration files -- I think that it would be fine to statically set $wgLBFactoryConf['sectionLoads']['labwikitestwiki'] = ['clouddb-foo1234'=>1]; in the config, after wmfEtcdApplyDBConfig() has been called.
I understand your needs, but what you are trying to do leave lots of fundamental questions floating around and I do think we need to address them before moving forward.
As I asked at T233236#5575780 there are many things that would need an answer or some sort of acknowledgment that they can potentially cause issues in the future - ie: no DC redundancy.
By having a writable database in codfw, you are effectively having a split brain with the labtestwiki that is being written in eqiad, and you'd no longer be able to have cross dc replication (unless you plan to also have a new host replicating that database into an RO eqiad database for redundancy).
However, that would be confusing, having 2 pieces of infrastructure, with the same database but just being written on two different datacenters and replicating cross DC, but that is your decision.
I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs
I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.
If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).
The only place that labtestwiki runs is in codfw. There is no MediaWiki deployment in eqiad that is connected to the labtestwiki database. The closest analog of labtestwiki for the main Wikimedia cluster is the wikifarm inside of the deployment-prep Cloud VPS project. Similarly labwiki only exists in eqiad with no counterpart in codfw. When a DC switch is performed, wikitech remains running from the labweb* hosts in eqiad.
I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.
labtestwiki is currently and will always be a snowflake. The database and associated MediaWiki deployment exist as part of what is now called the 'codfw1dev' deployment of Cloud Services. This is the staging and testing environment that we have for our OpenStack cluster and associated services. labtestwiki is a needed part of this environment for testing how various OpenStack changes will effect wikitech before we actually break the live wikitech. If we had the wiki functional during the testing of our recent OpenStack version upgrade we may have been able to avoid the issues from T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.
If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).
That is exactly the intent of this ticket. The schema migrations will be a bit of a pain to manage, but in past tickets where we asked if there was any way to have a read-write database in codfw we felt that the answer was no.
for what it's worth... I would prefer for the DBAs to manage this database rather than managing it myself -- that was my intent when I moved it to m5 in the first place. But as I understand it it's not currently possible to have a DBA-managed database that's writeable in codfw.
We have labtestwiki and labswiki are on m5 (and it is writable) so I treat them like any other database on that regard.
I am completely fine if your team wants to own labstestwiki, but I believe that if that is the case, it should be owned completely, not just the writable codfw replica, making it a snowflake.
labtestwiki is currently and will always be a snowflake. The database and associated MediaWiki deployment exist as part of what is now called the 'codfw1dev' deployment of Cloud Services. This is the staging and testing environment that we have for our OpenStack cluster and associated services. labtestwiki is a needed part of this environment for testing how various OpenStack changes will effect wikitech before we actually break the live wikitech. If we had the wiki functional during the testing of our recent OpenStack version upgrade we may have been able to avoid the issues from T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.
I do understand the requirements, I am just debating what's the best approach to make this work for everyone :-)
If you want to manage it I would suggest you also take labtestwiki out of m5 and set it up somewhere else, so you can manage both instances (eqiad/codfw) completely, from its setup, redundancy, backups, monitoring and schema changes (remember that mediawiki schema changes are also applied to labtestwiki).
That is exactly the intent of this ticket. The schema migrations will be a bit of a pain to manage, but in past tickets where we asked if there was any way to have a read-write database in codfw we felt that the answer was no.
The answer is "no" if handled by us, because that is a total snowflake in our infra as at the moment we only have writable databases on eqiad that are replicated to codfw, doesn't matter whether they are MW databases or misc databases.
Having a writable database in codfw breaks our consistency because we'd have to manage 2 databases, with the same name, on different servers, being written with different data depending on the DC, and again (this hasn't been answered yet) I assume you want to have a copy of those databases on the opposite DC.
As I have said earlier, if you prefer taking labtestwiki out from m5 into your own set of servers and owning it, I am fine with that.
I wouldn't feel comfortable managing that indeed. That is a snowflake, we manage lots of things and having such a small - but at the same time big - difference will most likely will end up with us making a human error.
So if you prefer to have two copies of labtestwiki one per DC and both being writable I would prefer if you own the whole infra for that (and of course, I can help with the migration).
We have no need at all for a copy of the labtestwiki database in eqiad, only codfw. As a testing only wiki with no active community we can get by with periodic dumps of the database for recovery.
For labswiki, some replica somewhere for disaster recovery of the primary in eqiad is necessary, but we have no plans to establish a MediaWiki deployment in codfw for wikitech. https://wikitech-static.wikimedia.org/wiki/Main_Page is the read-only "fallback" for wikitech in the event of a loss of ability to use the eqiad hosted wiki.
T161859: Make Wikitech an SUL wiki will change all of the database concerns for wikitech and eliminate the need for labtestwiki in codfw, but we do not currently have any estimate of when major blocker of T196171: Developer account creation without OpenStackManager will be resolved--neither the WMCS nor SRE teams currently have that work on their near term roadmaps.
As I said, I am fine with this, but I would prefer if WMCS owns the whole thing that is: labtestwiki setup on eqiad on a new WMCS owned host and its codfw sibling rather than DBAs owning labtestwiki on eqiad just because it is part of m5 and WMCS owning labtestwiki on codfw just because a RW database is needed there.
For labswiki, some replica somewhere for disaster recovery of the primary in eqiad is necessary, but we have no plans to establish a MediaWiki deployment in codfw for wikitech. https://wikitech-static.wikimedia.org/wiki/Main_Page is the read-only "fallback" for wikitech in the event of a loss of ability to use the eqiad hosted wiki.
T161859: Make Wikitech an SUL wiki will change all of the database concerns for wikitech and eliminate the need for labtestwiki in codfw, but we do not currently have any estimate of when major blocker of T196171: Developer account creation without OpenStackManager will be resolved--neither the WMCS nor SRE teams currently have that work on their near term roadmaps.
Yeah, labswiki is a lot more complex and I am leaving it aside from this equation.
Agreed. Once we have the new database setup on clouddb2001-dev and labtestwikitech pointed at it we will have no need for the copy of labtestwiki on m5 and it should be dropped to avoid confusing everyone.
Ok, and I understand that if we ever need that one on eqiad it will be equally owned by WMCS then?
Once we are ready to drop it from m5 please create a ticket with us so we can also clean the grants too.
Change 543664 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] labtestwiki: move to a wmcs-hosted database on clouddb2001-dev
The db is now working on clouddb2001-dev. I'm attaching a patch about grants -- I'll leave this to a DBA to explicitly remove grants and drop the database from m5.
Change 543955 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] m5 grants: remove grants for 'labtestwiki' database
Thanks!
I am going to first rename the tables on the database and leave them renamed for a few days before dropping the database and the grants.
Mentioned in SAL (#wikimedia-operations) [2019-10-18T05:19:10Z] <marostegui> Rename m5 labtestwiki database - T233236
Tables renamed:
root@cumin1001:/home/marostegui# mysql.py -hdb1133 labtestwiki -e "show tables" -BN T233236_abuse_filter T233236_abuse_filter_action T233236_abuse_filter_history T233236_abuse_filter_log T233236_accountaudit_login T233236_actor T233236_archive T233236_babel T233236_betafeatures_user_counts T233236_category T233236_categorylinks T233236_change_tag T233236_change_tag_def T233236_comment T233236_content T233236_content_models T233236_cu_changes T233236_cu_log T233236_echo_email_batch T233236_echo_event T233236_echo_notification T233236_echo_target_page T233236_externallinks T233236_filearchive T233236_geo_tags T233236_global_block_whitelist T233236_globalblocks T233236_image T233236_imagelinks T233236_interwiki T233236_ip_changes T233236_ipblocks T233236_ipblocks_restrictions T233236_iwlinks T233236_job T233236_l10n_cache T233236_langlinks T233236_ldap_domains T233236_linter T233236_log_search T233236_logging T233236_mathoid T233236_module_deps T233236_oathauth_users T233236_oauth_accepted_consumer T233236_oauth_registered_consumer T233236_objectcache T233236_oldimage T233236_openstack_notification_event T233236_openstack_puppet_classes T233236_openstack_puppet_groups T233236_openstack_puppet_vars T233236_openstack_tokens T233236_page T233236_page_props T233236_page_restrictions T233236_pagelinks T233236_protected_titles T233236_querycache T233236_querycache_info T233236_querycachetwo T233236_recentchanges T233236_redirect T233236_revision T233236_revision_actor_temp T233236_revision_comment_temp T233236_searchindex T233236_securepoll_cookie_match T233236_securepoll_elections T233236_securepoll_entity T233236_securepoll_lists T233236_securepoll_msgs T233236_securepoll_options T233236_securepoll_properties T233236_securepoll_questions T233236_securepoll_strike T233236_securepoll_voters T233236_securepoll_votes T233236_site_identifiers T233236_site_stats T233236_sites T233236_slot_roles T233236_slots T233236_spoofuser T233236_templatelinks T233236_text T233236_transcache T233236_transcode T233236_updatelog T233236_updates T233236_uploadstash T233236_user T233236_user_former_groups T233236_user_groups T233236_user_newtalk T233236_user_properties T233236_watchlist T233236_wikilove_log
Change 544098 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] labtestwikitech.pp: Specify the new location for labtestwiki
Change 544098 merged by Marostegui:
[operations/puppet@production] labtestwikitech.pp: Specify the new location for labtestwiki
Change 543664 merged by jenkins-bot:
[operations/mediawiki-config@master] labtestwiki: move to a wmcs-hosted database on clouddb2001-dev
Change 543955 merged by Marostegui:
[operations/puppet@production] m5 grants: remove grants for 'labtestwiki' database
Change 547596 had a related patch set uploaded (by CDanis; owner: Jforrester):
[operations/mediawiki-config@master] Split out DB-related concerns for real and test wikitechs into s10/s11
Change 550889 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] dbctl schemata changes for labswiki migration
Change 547597 had a related patch set uploaded (by Andrew Bogott; owner: Jforrester):
[operations/mediawiki-config@master] Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config
Change 550889 merged by CDanis:
[operations/puppet@production] dbctl schemata changes for labswiki migration
Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:17:33Z] <cdanis@cumin2001> dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json
Change 547596 merged by jenkins-bot:
[operations/mediawiki-config@master] Split out DB-related concerns for real and test wikitechs into s10/s11
Change 547597 merged by jenkins-bot:
[operations/mediawiki-config@master] Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config
Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:35:54Z] <catrope@deploy1001> Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)
Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:37:42Z] <catrope@deploy1001> Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)
Mentioned in SAL (#wikimedia-operations) [2019-11-14T18:49:02Z] <catrope@deploy1001> Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)
Mentioned in SAL (#wikimedia-operations) [2019-11-14T20:06:50Z] <cdanis@cumin2001> dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json
Change 550946 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] dbctl: remove now-obsolete 'wikitech' section
Change 550946 merged by CDanis:
[operations/puppet@production] dbctl: remove now-obsolete 'wikitech' section
Change 550958 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: rename 'wikitech' to 's10' to match prod
Change 550958 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: rename 'wikitech' to 's10' to match prod