Page MenuHomePhabricator

Add code to api_db to connect to database and to expose it through API endpoints
Closed, ResolvedPublic

Description

To add the functionality of being able to retrieve the backups metadata and to expose it through the API, we need to develop functionality/code to connect to the database.

We plan to do this using SQLAlchemy, an ORM (object-relation mapper). As this is really abstracted, we should try and understand what exactly is happening in the background (whether it keeping an open connection, whether it is sending queries in an optimised manner, I need to do some reading on this from my side)

We also need a way to pass this database configs through the environment variable as a config (eg, db host, name, password, etc) -- as part of this ticket this needs implementing as well

We will build on top of work done in T284399

Tasks

  • Create Data models for the backups table
  • Develop code to connect our api_db program to a MariaDB database using SQLAlchemy or other suitable tool
  • Create API endpoint to show recent backups for all sections (s1-s8)

Acceptance

  • Successfully able to connect to MariaDB database
  • Successfully able to display "some" database data through API
  • Successfully able to see recent data for sections through API call

Event Timeline

Change 702781 had a related patch set uploaded (by H.krishna123; author: H.krishna123):

[operations/software/bernard@master] [WIP] api_db: Add code to enable database connection Added code to connect to an SQL database, added skeleton for unit tests, cleaned up main.py file and added a singleton class to keep database configuration the same throughout the program. Added DB query functionality for the readiness probe

https://gerrit.wikimedia.org/r/702781

The above patch lacks unit tests with SQLite, need to sort that one out still. Have a look at the SQL (I have updated it with the recommended Here document style as you have suggested)
The files with the SQL are in crud.py python file
Let me know what you think of the patch when possible :-)

Investigating into how we use the databases library to write queries in a way that we avoid SQL injection, I followed the advice provided here as my original query used string interpolation and was susceptible to SQL injection attacks.
https://realpython.com/prevent-python-sql-injection/

Reading into the documentations for databases library (async library for lots of databases, including sqlalchemy, and mysql) -- I've found a method here that looks like is a safe way to do it. To prove that this is safe, I would have to test putting SQL injection commands myself to see if we get it on the DB log outputs. will give it a go (see raw queries)
https://www.encode.io/databases/database_queries/

So I've tried using the method suggested by the documentation
This is what a normal query looks like in my MariaDB logs when executed

210711 12:53:10 33 Query SELECT * FROM backups b WHERE b.section = 'm5'

And this is how a poisoned query would be substituted
(malicious query: ; SELECT * FROM backups WHERE 1; --)

210711 13:24:52 35 Query SELECT * FROM backups b WHERE b.section = '; SELECT * FROM backups WHERE 1; --'

I've realised it uses quotes on the query. So I tried putting some quotes on our malicious injection query (''''''; SELECT * FROM backups WHERE 1; --) to see if it leads to code injection and this is the result I got

210711 13:37:40 37 Query SELECT * FROM backups b WHERE b.section = '\'\'\'\'\'\'; SELECT * FROM backups WHERE 1; --'

So it looks like it didn't lead to any code injection and I assume this is the right way to use the library (as suggested by the RealPython link that you had shared, using parameters in the .execute command)

I will make existing database methods safer using the method specified in the documentation here and will push a patch for review soon(https://www.encode.io/databases/database_queries/ -- see raw queries)

Python code

       safe_query = "SELECT * FROM backups b WHERE b.section = :section"
# malicious value 
       values = {'section': "''''''; SELECT * FROM backups WHERE 1; --"}
# normal value example
       values = {'section': "m5"}
       print(safe_query)
       res = await DatabaseConnection.instance().database.fetch_one(query=safe_query, values=values)

Change 702781 merged by Jcrespo:

[operations/software/bernard@master] api_db: Add code to enable database connection, add API to obtain recent backups data and freshness, load config from alerting yaml file, add tests

https://gerrit.wikimedia.org/r/702781