Page MenuHomePhabricator

CloudVPS: run maintain-dbusers inside Toolforge
Open, Stalled, LowPublic

Description

We had the idea of running the maintain-dbusers logic from inside Toolforge (or potentially any other VM under our control).
Currently, this logic runs in labstore servers (those hosting NFS data).

This solves a weird situation in which now ToolsDB are VMs, but we still need to contact:

  • LDAP
  • labstores (for toolforge NFS data)
  • and ToolsDB itself

However, that requires some specific considerations, like:

  • how we manage secrets
  • does this create additional load for NFS (i.e, r/w in network instead of locally).
  • Trying to decide if our need today is part of a large need in the future, or if it is a side effect of only having moved some things out of the production realm

Event Timeline

aborrero created this task.Feb 17 2019, 8:22 PM
aborrero triaged this task as Normal priority.

Change 491064 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: introduce profile for maintain_dbusers in services nodes

https://gerrit.wikimedia.org/r/491064

Secrets probably need to be done in standalone puppetmasters like in tools now. I think it will not create significant additional load unless we do a large scale change with the tool to user's my.cnf files.

aborrero updated the task description. (Show Details)Feb 17 2019, 8:34 PM

And that's the problem of course :) We can't necessarily trust that.

bd808 added a comment.Feb 17 2019, 8:38 PM

In addition to ToolsDB, the maintain-dbusers script currently also manages user accounts on the Wiki Replica servers (labsdb10{09,10,11}). To maintain the current functionality we need to do one of:

  • allow code running on {cloud,lab}storeXXXX to communicate with a mysql server inside Cloud VPS address space
  • expose secrets to a service running inside Cloud VPS address space which can maintain authn/z credentials on the labsdb10{09,10,11} cluster
  • rethink the replica.my.cnf management process to split responsibility for wiki replicas (production realm) and toolsdb (labs realm)

Blast. Yeah. That makes (again) getting the labstore able to communicate with the cloud VPS high priority via something like T216353

Change 491064 merged by Andrew Bogott:
[operations/puppet@production] wmcs services: introduce profile for maintain_dbusers in services nodes

https://gerrit.wikimedia.org/r/491064

Change 491189 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] maintain_dbusers: Reverting to the old location to save git history

https://gerrit.wikimedia.org/r/491189

bd808 added a comment.Feb 18 2019, 5:18 AM

I started working on some refactoring and cleanup of the maintain-dbusers script and ran into a blocker for running the script from an NFS client rather than from the NFS server. That blocker is the script's use of chattr +i. The NFS protocol does not have support for manipulating file attributes via chattr or any other method. This use of the "immutable" bit is a protection against unintentional (and intentional) data corruption caused by users renaming, deleting, or otherwise modifying their $HOME/replica.my.cnf files.

After some discussion on irc, @Bstorm and I have a list of possible workarounds for this issue:

  1. Introduce a webservice on the NFS master that can be told to do the file reads/writes including the chattr step
  2. Stop using chattr
  3. Poke open the hole from production realm to 3306 in the cloud vps project so we can keep running the script on the NFS master
  4. Introduce a webservice on the database servers that can be instructed to manage the accounts
  5. Move the NFS master into the Cloud VPS network

As we are trying to make these changes relatively quickly, the options needing new webservices to be written are out of scope. Moving the NFS master into the Cloud VPS network is a long term goal, but also not something that we should rush. This leaves dropping the use of chattr (2) or opening port 3306 on the new ToolsDB server to the NFS master (3) as the viable options. Of these two options, I would rather see the port opened. I believe this would be part of T216353: toolsdb: firewalling changes for new setup (temporal mysql replication).

  1. Poke open the hole from production realm to 3306 in the cloud vps project so we can keep running the script on the NFS master

This should be now possible.

aborrero changed the task status from Open to Stalled.Feb 18 2019, 5:56 PM

Since we aren't doing this right now I'm marking the task as such to avoid confusion.

bd808 lowered the priority of this task from Normal to Low.Feb 18 2019, 6:26 PM

I'm dropping priority too. We may be better served by working on T216422: Virtualize NFS servers used exclusively by Cloud VPS tenants instead if our main concern is network isolation, but we can talk that through when we get things working well enough to stop and have an incident retrospective.

Change 491189 abandoned by Bstorm:
maintain_dbusers: Reverting to the old location to save git history

Reason:
Totally unneeded now.

https://gerrit.wikimedia.org/r/491189

I think we agree that this is essentially not a thing we are going to do until we implement Ceph or cloud-internal NFS (which kind of depends on Ceph to work well and not be terrifying anyway), right?

The only other way is if we suddenly decide to move to ironic or similar things for addressing the old NFS systems, which seems like time well-spent on Ceph.

GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:11 PM