Page MenuHomePhabricator

REST api service to manage toolforge replica.my.cnf
Closed, ResolvedPublic

Description

As part of refactoring maintain-dbusers.py, we need a service to run on an NFS server that creates replica.my.cnf in response to an API call.

This server should only need to support two endpoints:

PUT (or POST?) takes a filepath (or shell name?), username, and password and writes the file
GET takes a filepath (or shell name?) and returns the username and password

Proposed dev steps:

  1. Get the basic code working with local development and testing by hand with curl
  2. Deploy the tool (how?) on a test nfs server, run as a wsgi service with ngninx
  3. Add tls so that passwords aren't being transferred in clear text
  4. Consider auth. The quick and dirty approach would be to just lock this down via firewall rules.

Once all that is set, we can look at refactoring the rest of maintain-dbusers.py so that it talks to this new API.

Sample code to read/write the files can be found in maintain-dbusers.py in the write_replica_cnf() and read_replica_cnf() functions.

I've set up a test server for step 2. It is dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud. @Slst2020 and @Raymond_Ndibe if you respond here with your developer account names I can get you login access to that host.

Event Timeline

Andrew renamed this task from REST api service to manage toolsdb replica.my.cnf to REST api service to manage toolforge replica.my.cnf.Mar 17 2022, 4:30 AM
Andrew created this task.

I've set up a test server for step 2. It is dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud. @Slst2020 and @Raymond_Ndibe if you respond here with your developer account names I can get you login access to that host.

Mine is Slavina Stefanova

So the very basic implementation would be something like this?

import json

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()


class UserInfo(BaseModel):
    uid: int
    mysql_username: str
    pwd: str


@app.get('/')
def index():
    return {'message' : 'Hello world!'}

# in practice the updated read_replica_cnf() function would call the api get endpoint,
# receive the user info as json, then parse it into a Tuple[str, str]
@app.get('/replica_cnf/{file_path}')
def get_user_info(file_path: str):
    with open(file_path, mode='r') as f:
        return [json.loads(line) for line in f.readlines()]


@app.post('/replica_cnf/{file_path}')
def send_user_info(file_path: str, content: UserInfo):
    with open(file_path, mode='w') as f:
        # mock example using pure json; in practice, this is where the logic of
        # write_replica_cnf() would go
        return f.write(content.json()+'\n')

By the way, FastAPI uses ASGI. If WSGI is a hard requirement, there are ways to set this up, or we could just go with Flask.

I would like to double check this statement before we continue: we need a service to run on an NFS server that creates replica.my.cnf in response to an API call. because https://wikitech.wikimedia.org/wiki/Cross-Realm_traffic_guidelines

  • which particular NFS server is running the REST API server.
  • which particular client is communicating with the REST API server.
  • which IP address is the server using to listen to the service, and which client IP address it will see.
  • which IP address is the client using, and which IP is connecting to.

@aborrero please see the parent task for the problem statement.

My expectation is that the primary service will run on a cloudcontrol host. That will me the existing 'maintain-dbusers' service, hereafter called 'the client'

The client will need to talk to two nfs servers in order to inject credentials: one in the 'tools' project (which will contain home and project mounts for toolforge' and one in the 'paws' project.

The rest api (hereafter 'the server') will listen on a public IP on each NFS server.

This is the best design I've landed on so far but I welcome alternatives.

@aborrero please see the parent task for the problem statement.

My expectation is that the primary service will run on a cloudcontrol host. That will me the existing 'maintain-dbusers' service, hereafter called 'the client'

The client will need to talk to two nfs servers in order to inject credentials: one in the 'tools' project (which will contain home and project mounts for toolforge' and one in the 'paws' project.

The rest api (hereafter 'the server') will listen on a public IP on each NFS server.

This is the best design I've landed on so far but I welcome alternatives.

OK! Thanks for the clarification, sounds good. This passes all the cross-realm checks.

For documentation purposes, let me write down an approach which could be more elegant in the sense of not having to involve unrelated servers (like cloudcontrols):

  • we run maintain_dbusers inside kubernetes @ tools (and another copy inside kubernetes @ paws) -- similar to how we run maintain_kubeusers.
  • it already has access to NFS (because all kubernetes nodes have)
  • it already has access to toolsdb (a virtual machine @ cloud vps)
  • the wikireplicas are in the wikiproduction realm, but we have a convenient proxy layer to access them as if they were inside the cloud (see DNS at https://openstack-browser.toolforge.org/project/clouddb-services)
  • with this approach, the only thing we need to sync between realms is the SQL privileged user/password/grant required to maintain unprivileged user accounts. <-- I don't know offhand how to do this bit elegantly. Perhaps a spicerack cookbook?

OK! Thanks for the clarification, sounds good. This passes all the cross-realm checks.

Follow up/clarification: as long as the REST API is behind a public floating IP (which can be nova proxy).

I've set up a test server for step 2. It is dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud. @Slst2020 and @Raymond_Ndibe if you respond here with your developer account names > I can get you login access to that host.

@Andrew my developer account username is Raymond Ndibe

So the very basic implementation would be something like this?

Yes, pretty much! The specifics of what and how to read/write should be obvious in the existing maintain-dbusers code.

Also, I just now confirmed that flask is 'easy' to install on the server I want to run this on, and fastapi is less easy. So that makes flask the easy choice.

@Slst2020 and @Raymond_Ndibe you should both be able to 'ssh dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud' now (and get root access there). If you need to mess with ssh config, these docs should help: https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#ProxyJump_(recommended)

So the very basic implementation would be something like this?

Yes, pretty much! The specifics of what and how to read/write should be obvious in the existing maintain-dbusers code.

Also, I just now confirmed that flask is 'easy' to install on the server I want to run this on, and fastapi is less easy. So that makes flask the easy choice.

We will go with Flask then, no problem. @Raymond_Ndibe and I have set up a quick meeting tomorrow to get started with the task. Let us know if there are any other requirements beyond the ones already mentioned.

Is there anything I can do to help with this? I'm assuming that once you have a local proof of concept that I'll do the puppet/deployment bits unless one of you is excited about learning that.

Is there anything I can do to help with this? I'm assuming that once you have a local proof of concept that I'll do the puppet/deployment bits unless one of you is excited about learning that.

Hello @Andrew, I attempted ssh into the server to upload the prove of concept code I've written but couldn't, and I've done all the necessary ssh setups on my own end. I can see that my project role on the project members page is "user" while that of others includes "projectadmin", perhaps this could be the reason?
How do we fix this?

Is there anything I can do to help with this? I'm assuming that once you have a local proof of concept that I'll do the puppet/deployment bits unless one of you is excited about learning that.

Hello @Andrew, I attempted ssh into the server to upload the prove of concept code I've written but couldn't, and I've done all the necessary ssh setups on my own end. I can see that my project role on the project members page is "user" while that of others includes "projectadmin", perhaps this could be the reason?
How do we fix this?

Hey, "user" should still let you log in to any instances. Could you add -vvv to your SSH command and post the output (and the contents of your .ssh/config) here?

debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolving "dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud" port 22
debug2: ssh_connect_direct
debug1: Connecting to dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud [172.16.5.60] port 22.
debug1: connect to address 172.16.5.60 port 22: Connection refused
ssh: connect to host dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud port 22: Connection refused

This is the output when -vvv is added to the ssh command. unless you meant the content of "/etc/ssh/ssh_config", I don't have any other config inside the ".ssh/config" folder

Ah - in that case you need to create a .ssh/config file in your home directory as described here: https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Accessing_Cloud_VPS_instances

In T304040#7826163, @Majavah wrote:

Ah - in that case you need to create a .ssh/config file in your home directory as described here: https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Accessing_Cloud_VPS_instances

oooo yeaa, just did and it worked. Wonder why I ignored that in the first place. Thanks @Majavah !

Hello @Andrew , I uploaded the first iteration of the service last week, but other than ensuring all the flask parts work great, we need to resume testing the functionality proper and right now we don't know exactly how to go about ensuring that it does what it is supposed to do. I also discussed with @Slst2020 and we agreed that we need a repo for code sharing and review so we'd be needing your help with that, I (and Slavina I think ) don't have the right to create repos yet.

It's moderately easier for me if the new code lands in the puppet repo where the other half (the client from your point of view) lives.

You can check out that repo with 'git clone "https://gerrit.wikimedia.org/r/operations/puppet"' and put the new files under modules/profile/files/wmcs/nfs/

I'll do my best to responsive with merges there as needed; we can rethink this scheme if the project gets too complicated to squeeze in there.

Change 777037 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] Create REST api service to manage toolforge replica.my.cnf

https://gerrit.wikimedia.org/r/777037

Is there anything I can do to help with this? I'm assuming that once you have a local proof of concept that I'll do the puppet/deployment bits unless one of you is excited about learning that.

@Raymond has been doing all the work on the code while I was on sick leave. I would love to learn the puppet/deployment bits, though.

FYI @Slst2020 and @Raymond_Ndibe, I'm mostly out this week and next week so won't be especially responsive. I will see if/when I can get you unblocked regarding deployment.

For puppet deployment: I've hooked up dbusers-nfs-1.testlabs.eqiad1.wikimedia.cloud to the puppetmaster abogott-puppetmaster.testlabs.eqiad.wmflabs. That means that anyone can apply local puppet hacks to /var/lib/git/operations/puppet on puppetmaster-abogott and see the consequences on dbuseres-nfs-1.

That's clearly not sufficient info to start hacking but it's at least a place for said hacking to happen :)

Change 809921 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] Modify maintain-dbusers.py to call the rest-api service

https://gerrit.wikimedia.org/r/809921

Change 777037 merged by David Caro:

[operations/puppet@production] Create REST api service to manage toolforge replica.my.cnf

https://gerrit.wikimedia.org/r/777037

Change 810965 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] wmcs: changes to api service to manage toolforge replica.my.cnf

https://gerrit.wikimedia.org/r/810965

Change 842454 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/842454

Change 842454 merged by Andrew Bogott:

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/842454

Change 849166 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/849166

Change 849166 merged by Andrew Bogott:

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/849166

Change 849173 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/849173

Change 849173 merged by Andrew Bogott:

[operations/puppet@production] wmcs: format and refactor maintain-dbusers.py

https://gerrit.wikimedia.org/r/849173

Change 867566 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] replica_cnf_web: add functional tests

https://gerrit.wikimedia.org/r/867566

Change 810965 merged by David Caro:

[operations/puppet@production] wmcs: changes to api service to manage toolforge replica.my.cnf

https://gerrit.wikimedia.org/r/810965

Change 867566 merged by David Caro:

[operations/puppet@production] replica_cnf_web: add functional tests

https://gerrit.wikimedia.org/r/867566

Change 887872 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] puppet: adapt replica_cnf_api to python3.5

https://gerrit.wikimedia.org/r/887872

Change 887872 merged by Andrew Bogott:

[operations/puppet@production] puppet: adapt replica_cnf_api to python3.5

https://gerrit.wikimedia.org/r/887872

Change 888112 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] puppet: adapt replica_cnf_api to python3.5

https://gerrit.wikimedia.org/r/888112

Change 888112 merged by Andrew Bogott:

[operations/puppet@production] puppet: adapt replica_cnf_api to python3.5

https://gerrit.wikimedia.org/r/888112

Change 893826 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] maintain_dbusers: seperate config from code changes

https://gerrit.wikimedia.org/r/893826

Change 893826 merged by Andrew Bogott:

[operations/puppet@production] maintain_dbusers: seperate config from code changes

https://gerrit.wikimedia.org/r/893826

Change 894025 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] wmcs: nfs: primary: introduce missing hiera keys for maintain_dbusers

https://gerrit.wikimedia.org/r/894025

Change 894025 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] wmcs: nfs: primary: introduce missing hiera keys for maintain_dbusers

https://gerrit.wikimedia.org/r/894025

Change 809921 merged by David Caro:

[operations/puppet@production] Modify maintain-dbusers.py to call the rest-api service

https://gerrit.wikimedia.org/r/809921

Change 898784 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] Improvements to maintain-dbusers and the rest-api

https://gerrit.wikimedia.org/r/898784

Change 898784 merged by David Caro:

[operations/puppet@production] Improvements to maintain-dbusers and the rest-api

https://gerrit.wikimedia.org/r/898784

This is largely done; reassigning to @dcaro to close.