Page MenuHomePhabricator

Debmonitor: request for misc DB allocation
Closed, ResolvedPublic

Description

As part of the DebMonitor Goal (T191298) I'd like to request some resources in one of our Misc MySQL clusters to be used as the backend for DebMonitor.

The expected usage will be as follows:

Space

From the current structure and data I have in the test instance in labs I have made a ballpark estimation of ~2GB of space on disk needed, but let's consider 5~10GB to be on the safe side. For the details of the tables:

  • 1 bridge table will have ~1M rows (# hosts ~1300 multiplied by # installed packages ~770), but is made of 6 int, 2 datetime, 1 varchar that will hold very short strings (0 or 8 char for now with majority of 0).
  • 4 tables to keep track of packages and source packages and their respective versions, for which I expect them to be surely smaller than 100k rows each, at least for the foreseeable future.
  • all the other tables are negligible.

Queries

Given that the query activity will be heavily dependent on the changes in the installed/upgraded/upgradable packages across the fleet, it's hard to make good estimations. In general all queries will be done in small bursts when the script on each host will connect to the DebMonitor server sending it's updates. This is what I was able to predict:

  1. Normal operation after the initial ramp-up:
    • a baseline of ~2,5 SELECT/s; ~0,75 UPDATE/s; 0,015 DELETE/s more or less evenly distributed across the day generated by an apt-get update hook (run before any puppet run) and a crontab that will run the debmonitor CLI once a day per host.
    • additional SELECT/INSERT/UPDATE based on how the packages change on the hosts. Let's take a couple of scenarios:
      • a package it's added to the base package list and will be installed in all hosts in the next puppet run. As a result in the next 30 minutes there will be an additional ~2,9 SELECT/s and ~0,72 INSERT/s.
      • apt-get update detects that an update is available for a package installed on all hosts. As a result in the next 30 minutes there will be an additional ~1,4 SELECT/s and ~0,72 UPDATE/s.
    • queries generated by the webUI should be negligible, given that low number of potential users (SRE only).
  2. Initial ramp-up. We can decide how to spread this in time in order to have it not harm the cluster. Here a couple of possible scenarios:
    • just enabling it with the final daily crontab will generate for the first 24h ~46 SELECT/s and ~12 INSERT/s.
    • we can spread that across a longer time-period, let's say a week, in this case they will become ~6.7 SELECT/s and ~1.7 INSERT/s.
    • If you have any preference I'll adapt the puppet patches accordingly.

Let me know if you need any additional data or have any question/comment

Event Timeline

Volans triaged this task as Normal priority.Apr 24 2018, 9:11 AM
Volans created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2018, 9:11 AM

I think m2 would be the right place to set it up, more so after the database there is upgraded. How are you going to handle dc monitoring, will you store things on a single datacenter?

How would you handle a DC fail? What would happen if the active master dies? Will things pile up from your side? What if you need to read from codfw?

Sorry, I didn't mention the multi-DC setup :)

The idea for now is to install DebMonitor server on two hosts, one per main DC in an active/passive setup.
The webUI will be served through varnish, hence will be managed via the usual varnish director in puppet and can be switched to the other DC.
The internal endpoint to which the hosts will send the updates will instead be managed with DNS discovery to allow to easily switch it between DC when needed.
If the DB master to which it will connect fails and will be switched to the other DC master, we can just flip DebMonitor to the other DC. Regarding the eventual missing data that got lost during the outage, the daily crontab will take care of it, reconciling any missing change for each host within 24h.

I can also consider to make DebMonitor be able to read locally and write remotely with TLS, if that's something that we think is needed.

From where will you be querying this DB? (just to see which (new) grants you might need?

From where will you be querying this DB? (just to see which (new) grants you might need?

@Marostegui only from debmonitor servers, that are not yet there, they should be two ganeti VMs, probably debmonitor[12]001.

I am fine placing this on m2 as Jaime originally suggested.

@Volans you can speed up the process by setting some password on the private repository (and some non-private equivalent in the labs/private one), and suggesting a charset/collation for the database (utf8mb4?). Name debmonitor?

Change 429818 had a related patch set uploaded (by Volans; owner: Volans):
[labs/private@master] Debmonitor: add dummy MySQL password to hiera

https://gerrit.wikimedia.org/r/429818

@Volans you can speed up the process by setting some password on the private repository (and some non-private equivalent in the labs/private one), and suggesting a charset/collation for the database (utf8mb4?). Name debmonitor?

@jcrespo:

DB namedebmonitor
DB userdebmonitor
DB passwordhieradata/role/common/debmonitor/server.yaml in the private puppet repo
DB charsetutf8mb4
GRANT fromdebmonitor[12]001 + eventually deployment hosts (tin/naos/deploy[12]001)

Thanks a lot!

Change 429818 merged by Volans:
[labs/private@master] Debmonitor: add dummy MySQL password to hiera

https://gerrit.wikimedia.org/r/429818

jcrespo claimed this task.May 3 2018, 8:02 AM
jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.
root@db1051[debmonitor]> SELECT * FROM test;
+------+
| c    |
+------+
| 💩     |
+------+
1 row in set (0.00 sec)

root@db1051[debmonitor]> SHOW CREATE DATABASE debmonitor;
+------------+-------------------------------------------------------------------------------------------------------+
| Database   | Create Database                                                                                       |
+------------+-------------------------------------------------------------------------------------------------------+
| debmonitor | CREATE DATABASE `debmonitor` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci */ |
+------------+-------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@Volans You can test it now, but remember this is a full production host- no load testing or other silly things, as this shares service with otrs and gerrit.

Some more notes below:

jynus@tin:~$ mysql -h m2-master.eqiad.wmnet -udebmonitor -p$PASS debmonitor 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 65273618
Server version: 5.5.5-10.1.31-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:debmonitor@m2-master.eqiad.wmnet [debmonitor]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| debmonitor         |
| information_schema |
+--------------------+
2 rows in set (0.00 sec)

mysql:debmonitor@m2-master.eqiad.wmnet [debmonitor]> SHOW TABLES;
Empty set (0.00 sec)

mysql:debmonitor@m2-master.eqiad.wmnet [debmonitor]> SHOW GRANTS;
+-----------------------------------------------------------------------+
| Grants for debmonitor@************                                    |
+-----------------------------------------------------------------------+
*************************************************************************
| GRANT ALL PRIVILEGES ON `debmonitor`.* TO 'debmonitor'@'************' |
+-----------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql:debmonitor@m2-master.eqiad.wmnet [debmonitor]>

Please don't close the ticket yet, some things are pending:

  • Add debmonitor to backups- you can opt-out, but you have to give a good reason not to
  • Control the accounts somewhere
  • deployment2001 has not been added because it doesn't exist and we need an ip to track it on firewall/grants
  • There will be soon a failover of this section for hardware upgrade, you may want to wait to use it in production if you want to avoid disruption
  • TLS is available, but due to the proxy, it fails domain verification. This is an issue common to all misc hosts, needs a fix (double cert or something else)
  • Currently there is not a good replica available, but there are plans to provide one if you can do read-write split

Backups added on codfw (there are not, at the time, backups for m2 on eqiad) for dbmonitor.

jcrespo reassigned this task from jcrespo to Volans.May 3 2018, 9:17 AM
jcrespo moved this task from Blocked external/Not db team to Done on the DBA board.

Assigning to you for verification and read the caveats at T192875#4177782.

Volans added a comment.May 3 2018, 9:24 AM

Great! Thanks a lot.

@Volans You can test it now, but remember this is a full production host- no load testing or other silly things, as this shares service with otrs and gerrit.

Sure!

Please don't close the ticket yet, some things are pending:

  • Add debmonitor to backups- you can opt-out, but you have to give a good reason not to

Ack, happy to have backups.

  • Control the accounts somewhere
  • deployment2001 has not been added because it doesn't exist and we need an ip to track it on firewall/grants
  • There will be soon a failover of this section for hardware upgrade, you may want to wait to use it in production if you want to avoid disruption

I don't think a disruption will be a problem. In the worse case scenario after 24h all the data will be up-to-date anyway.

  • TLS is available, but due to the proxy, it fails domain verification. This is an issue common to all misc hosts, needs a fix (double cert or something else)

Ack, I'm ok connecting in plain text on the local DB, debmonitor will be deployed as active/passive, so no remote connections are expected. And even in the case of remote connection the only 'PII' I can think of are the session IDs of the eventual users (SRE-only) using the frontend at that moment.

  • Currently there is not a good replica available, but there are plans to provide one if you can do read-write split

Debmonitor doesn't have read/write splitting capabilities as of now. Is something I've thought to add, but most of the DB traffic will be done by the updates from the hosts, and reads that could be done on a slave are not that many. I'll look into it later on, after analysing how many queries could go to a slave.

@jcrespo naos has been reimaged to deploy2001.codfw.wmnet, so I guess it can now be added to the grants. Mentioning it here just for not forgetting, there is absolutely no hurry do to do it.

@jcrespo FYI I was deploying debmonitor today and the replication broke on db1065 and db1117 because of missing debmonitor database.

I've took the liberty to create it there too, in the same way you did on the m2 master:

CREATE DATABASE debmonitor COLLATE='utf8mb4_unicode_520_ci';

Given that I was at it I've also added the GRANT for deploy2001 mentioned in the previous message.
I hope that it was ok, let me know otherwise.

@Volans I believe this to be https://bugs.launchpad.net/mydumper/+bug/1558164 of myloader, used to clone the database. This is a nasty bug, but probably not as bad as it could be. myloader ended with success and no errors- and the database exists on the dump.

I will have to workaround this by checking for empty databases after load.

@jynus got it, thanks for the info. FYI if you want to test your workaround solution, there is another DB missing: frimpressions. I didn't re-create it though, as I have no context on it. I would have told you tomorrow ;)

@jcrespo, You asked to not resolve it for now, anything left here on your side? Anything I can help with?

jcrespo added a comment.EditedJun 28 2018, 4:21 PM

We should create a ticket to document the bug and have some kind of workaround before closing it.

The resolution doesn't have to be done, just the ticket with it being open.

Vvjjkkii renamed this task from Debmonitor: request for misc DB allocation to odeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii removed Volans as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Marostegui renamed this task from odeaaaaaaa to Debmonitor: request for misc DB allocation .Jul 1 2018, 5:10 AM
Marostegui assigned this task to Volans.
Marostegui lowered the priority of this task from High to Normal.
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: GerritToPhabBot.

Resolving as debmonitor is in production and the restore issue is tracked in T200035. Thanks a lot for the help!

Volans closed this task as Resolved.Jul 19 2018, 6:00 PM