CloudVPS: run maintain-dbusers inside Toolforge
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	aborrero
	Feb 17 2019, 8:22 PM

Description

We had the idea of running the maintain-dbusers logic from inside Toolforge (or potentially any other VM under our control).
Currently, this logic runs in labstore servers (those hosting NFS data).

This solves a weird situation in which now ToolsDB are VMs, but we still need to contact:

LDAP
labstores (for toolforge NFS data)
and ToolsDB itself

However, that requires some specific considerations, like:

how we manage secrets
does this create additional load for NFS (i.e, r/w in network instead of locally).
Trying to decide if our need today is part of a large need in the future, or if it is a side effect of only having moved some things out of the production realm

Details

	Subject	Repo	Branch	Lines +/-
	maintain_dbusers: Reverting to the old location to save git history	operations/puppet	production	+5 -649
	wmcs services: introduce profile for maintain_dbusers in services nodes	operations/puppet	production	+769 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Resolved	None	T207536 Move various support services for Cloud VPS currently in prod into their own instances
		Unknown Object (Task)
Resolved	• chasemp	T172538 rack/setup/install labvirt10(19\|20).eqiad.wmnet
Resolved	• Bstorm	T216208 ToolsDB overload and cleanup
Declined	None	T216173 labsdb1005/6 - Upgrade to Stretch
Resolved	• Bstorm	T193264 Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020
Declined	None	T216373 CloudVPS: run maintain-dbusers inside Toolforge

Event Timeline

aborrero triaged this task as Medium priority.Feb 17 2019, 8:22 PM

aborrero created this task.

Change 491064 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: introduce profile for maintain_dbusers in services nodes

https://gerrit.wikimedia.org/r/491064

gerritbot added a project: Patch-For-Review.Feb 17 2019, 8:24 PM

Secrets probably need to be done in standalone puppetmasters like in tools now. I think it will not create significant additional load unless we do a large scale change with the tool to user's my.cnf files.

aborrero updated the task description. (Show Details)Feb 17 2019, 8:34 PM

And that's the problem of course :) We can't necessarily trust that.

In addition to ToolsDB, the maintain-dbusers script currently also manages user accounts on the Wiki Replica servers (labsdb10{09,10,11}). To maintain the current functionality we need to do one of:

allow code running on {cloud,lab}storeXXXX to communicate with a mysql server inside Cloud VPS address space
expose secrets to a service running inside Cloud VPS address space which can maintain authn/z credentials on the labsdb10{09,10,11} cluster
rethink the replica.my.cnf management process to split responsibility for wiki replicas (production realm) and toolsdb (labs realm)

Blast. Yeah. That makes (again) getting the labstore able to communicate with the cloud VPS high priority via something like T216353

Change 491064 merged by Andrew Bogott:
[operations/puppet@production] wmcs services: introduce profile for maintain_dbusers in services nodes

https://gerrit.wikimedia.org/r/491064

Change 491189 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] maintain_dbusers: Reverting to the old location to save git history

https://gerrit.wikimedia.org/r/491189

Krenair subscribed.Feb 18 2019, 1:56 AM

I started working on some refactoring and cleanup of the maintain-dbusers script and ran into a blocker for running the script from an NFS client rather than from the NFS server. That blocker is the script's use of chattr +i. The NFS protocol does not have support for manipulating file attributes via chattr or any other method. This use of the "immutable" bit is a protection against unintentional (and intentional) data corruption caused by users renaming, deleting, or otherwise modifying their $HOME/replica.my.cnf files.

After some discussion on irc, @Bstorm and I have a list of possible workarounds for this issue:

Introduce a webservice on the NFS master that can be told to do the file reads/writes including the chattr step
Stop using chattr
Poke open the hole from production realm to 3306 in the cloud vps project so we can keep running the script on the NFS master
Introduce a webservice on the database servers that can be instructed to manage the accounts
Move the NFS master into the Cloud VPS network

As we are trying to make these changes relatively quickly, the options needing new webservices to be written are out of scope. Moving the NFS master into the Cloud VPS network is a long term goal, but also not something that we should rush. This leaves dropping the use of chattr (2) or opening port 3306 on the new ToolsDB server to the NFS master (3) as the viable options. Of these two options, I would rather see the port opened. I believe this would be part of T216353: toolsdb: firewalling changes for new setup (temporal mysql replication).

zhuyifei1999 subscribed.Feb 18 2019, 5:25 AM

bd808 mentioned this in T216353: toolsdb: firewalling changes for new setup (temporal mysql replication).Feb 18 2019, 6:18 AM

In T216373#4960664, @bd808 wrote:

Poke open the hole from production realm to 3306 in the cloud vps project so we can keep running the script on the NFS master

This should be now possible.

Since we aren't doing this right now I'm marking the task as such to avoid confusion.

I'm dropping priority too. We may be better served by working on T216422: Virtualize NFS servers used exclusively by Cloud VPS tenants instead if our main concern is network isolation, but we can talk that through when we get things working well enough to stop and have an incident retrospective.

Change 491189 abandoned by Bstorm:
maintain_dbusers: Reverting to the old location to save git history

Reason:
Totally unneeded now.

https://gerrit.wikimedia.org/r/491189

I think we agree that this is essentially not a thing we are going to do until we implement Ceph or cloud-internal NFS (which kind of depends on Ceph to work well and not be terrifying anyway), right?

The only other way is if we suddenly decide to move to ironic or similar things for addressing the old NFS systems, which seems like time well-spent on Ceph.

• GTirloni unsubscribed.Mar 21 2019, 9:11 PM

I'm going to close this up, since we are not going to proceed with this any time soon.

CloudVPS: run maintain-dbusers inside ToolforgeClosed, DeclinedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

CloudVPS: run maintain-dbusers inside Toolforge
Closed, DeclinedPublic
Actions

Related Objects
Search...