Page MenuHomePhabricator

Database for XHGui profiles
Closed, ResolvedPublic

Description

The XHGui profiling tool currently stores its data in an old version of MongoDB. Due to licensing changes, it is no longer feasible for us to use MongoDB.

Per T180761, I've updated XHGui to use MariaDB instead, but we need somewhere to host this database. It needs to be reachable from MediaWiki (possibly just the mwdebugXXXX hosts?) as well as the host serving the XHGui front-end (webperf1002).

Data will be written to this database only when profile is present in the X-Wikimedia-Debug header (i.e. only when requested by someone debugging a problem), one row per request. Row size varies based on the page being profiled.

Cross-DC replication would be nice to have, but is not a hard requirement. In a cross-DC scenario, MediaWiki would write to the master instance, MariaDB would be responsible for replicating those writes, and each XHGui front-end would read from the database in its local datacenter.

I think this probably belongs in one of the "misc" shards, similar to other tools such as Etherpad or Phabricator.

Event Timeline

@Krinkle wrote at T199853:

In these 14 months (2016-2018), we've created 14 GB worth of data.

July 2018
krinkle@tungsten:~$ df -h
/dev/mapper/tank-data  1.6T   14G  | /srv

Currently:

June 2020
/dev/mapper/tank-data  1.6T   18G  1.6T   2% /srv

Note that the MonogDB was accidentally wiped by me in late 2018, so I'd say we also created ~14-18G of profiles in the past 14 months (2018-2020),

Task description by @dpifke :

It does not need to be retained very long - perhaps a few weeks at most? Likewise, we probably do not need to back it up.

We don't expire these currently and I think they're quite useful for bookmarking and revisiting later, even after a month or year I'd rather not commit to proactive expiration of these currently.

Task description by @dpifke:

In a cross-DC scenario, MediaWiki would write to the master instance, MariaDB would be responsible for replicating those writes, and each XHGui front-end would read from the database in its local datacenter.

Sounds good to me. Given how small the data is and the very very low insert/select rate (far less than once a second), we wouldn't want our own cluster, which means we'll have to make due with the lowest common demoninator in terms of replication.

@Marostegui XHGui does not depend on auto-increment, and can be treated as append-only, which means it would be fine also with a dc-local writable master and bi-di replication. However, assuming no misc-DB server supports that yet, don't worry about it. It's totally fine for these debug writes to happen cross-dc.

However, from a data security point of view, I suppose there may be concerns with having MediaWiki be able to write and read data from a misc DB. So I'll wait to hear from you whether any of the current misc DBs could accomodate this. Maybe db-level permission grants would suffice. Note that there will exist no code to read from this DB, it will only be written to, and only from mwdebug servers when X-Wikimedia-Debug;profile is set.

Thanks for the information.
I think we might want to place this on m2, where otrs, recommendationapi, gerrit live, so we can group things that come from "outside".

The connection should go via the proxies we have (the CNAME is m2-master.eqiad.wmnet).
We need to know which grants are needed for this database, and those grants will only be available via proxy, so mwdebug hosts should point to that proxy.

We don't have standalone hosts for this, and all our misc hosts have replication enabled, meaning it will replicate to codfw (which is RO).
What happens if we need to serve via codfw (ie: datacenter switchover). We normally don't swap misc services and they serve always from eqiad. What would happen in that case? Is the connection encrypted?
If we do a DC switchover, I guess the codfw mwdebug hosts would be the ones writing to this DB which would stay in eqiad.

Even though it looks like this DB won't grow much, do we have any sort of purge job to clean up rows?

For testing, I've created a new MariaDB host in the beta cluster: deployment-mdb01.deployment-prep.eqiad.wmflabs. On there, I manually created the xhgui user and database, and verified I can connect to it from deployment-mediawiki-07. (This took several attempts, after I first selected an instance type with not enough disk space, then accidentally selected buster instead of stretch and couldn't figure out why things weren't working... third time's the charm.)

I will finish setting this up tomorrow: applying the mediawiki-config patch from T180761 to the beta deployment, and creating a beta instance of the latest XHGui. (Then I am going to be out until mid next-week.)

Excellent - let's resume this once you are back

Change 607567 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] xhgui: add new fake password for new mysql db

https://gerrit.wikimedia.org/r/607567

Change 607567 merged by Dzahn:
[labs/private@master] xhgui: add new fake password for new mysql db

https://gerrit.wikimedia.org/r/607567

@dpifke @Marostegui Hi, I did the following:

  • added new random password in private puppet in hieradata: puppetmaster1001:/srv/private/hieradata/role/common/webperf/xhgui.yaml profile::webperf::xhgui::mysql_password: '<redacted>'
  • amended the change, see diff between PS 12 and PS 13: https://gerrit.wikimedia.org/r/c/operations/puppet/+/603550/12..13
    • added host (m2-master), db (xhgui) and user (xhgui) in public hieradata/role/common/webperf/xhgui.yaml
    • let profile::webperf::xhgui do a Hiera lookup for host, db, user, password and stop using the passwords class altogether. This is a newer method to handle secrets, code is a bit cleaner and the hiera lookups should "just work" , also in cloud

So if this seems fine to you, @Marostegui you could just take these values when creating the new database.

Thanks @Dzahn
There is still a question from T254795#6204661 that needs answsering:

  • Which grants do we need this user to have?

Also, to confirm, connections will come from:
mwdebug1001, mwdebug1001
mwdebug2001, mwdebug2002

@dpifke Doesn't it have to connect from xhgui1001/xhgui2001 (but would that be in addition to mwdebug and webperf* ?)

User needs to be able to CREATE TABLE, INSERT, and SELECT. Possibly DELETE if we someday want to limit retention.

Connections from mwdebug* and xhgui*. (No need for the webperf* hosts.)

Change 607930 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] production-m2.sql.erb: Add grants for xhgui

https://gerrit.wikimedia.org/r/607930

I have created the user and allowed it from dbproxies. Reminder: the application needs to connect to m2-master.eqiad.wmnet which points to our dbproxies.
These are the grants (they are created for the two eqiad proxies and the 1 codfw proxy, just pasting one for simplicity), the password is the one defined by Daniel at puppetmaster1001:/srv/private/hieradata/role/common/webperf/xhgui.yaml

+----------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for xhgui@10.64.16.19                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'xhgui'@'10.64.16.19' IDENTIFIED BY PASSWORD 'xx' WITH MAX_USER_CONNECTIONS 100 |
| GRANT SELECT, INSERT, DELETE, CREATE ON `xhgui`.* TO 'xhgui'@'10.64.16.19'                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------+

The database is created and empty:

root@db1132.eqiad.wmnet[xhgui]> show tables;
Empty set (0.001 sec)

Testing:

# mysql  -hm2-master -uxhgui -p xhgui
Enter password:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 53620627
Server version: 10.1.44-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

xhgui@m2-master[xhgui]>
xhgui@m2-master[xhgui]> show grants;
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for xhgui@10.64.16.19                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'xhgui'@'10.64.16.19' IDENTIFIED BY PASSWORD '*xx' WITH MAX_USER_CONNECTIONS 100 |
| GRANT SELECT, INSERT, DELETE, CREATE ON `xhgui`.* TO 'xhgui'@'10.64.16.19'                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.001 sec)

Change 607930 merged by Marostegui:
[operations/puppet@production] production-m2.sql.erb: Add grants for xhgui

https://gerrit.wikimedia.org/r/607930

Marostegui triaged this task as Medium priority.Jun 29 2020, 4:55 AM

Can you or @Dzahn add the password to PrivateSettings.php on deploy1001? Or drop it in my home directory there so I can update it? (I don't have access to the copy in puppet-private.)

I'll verify the DB is working (and hopefully close this task) as soon as that's available.

Can you or @Dzahn add the password to PrivateSettings.php on deploy1001? Or drop it in my home directory there so I can update it? (I don't have access to the copy in puppet-private.)

Is this really just a one-off for this one user for this one time, or does it make sense to do a more general "users with shell access to perf-team servers should also have credentials to get on the xhgui db"? If the latter then we can do it the nice way by letting puppet write a .my.cnf into each home.

@dpifke The password is now in a file in your home dir on deploy1001, separate from that question above.

Change 608456 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] xhgui: let perf-team admins have access to xhgui DB (WIP)

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608456

Mentioned in SAL (#wikimedia-operations) [2020-07-01T00:14:28Z] <krinkle@deploy1001> Synchronized private/PrivateSettings.php: T254795 - Set $wmgXhguiDBuser and $wmgXhguiDBpasswor (duration: 01m 06s)

Also shared the password with Krinkle who added credentials to PrivateSettings.php.

I think this ticket is done now.

Change 608456 abandoned by Dzahn:
xhgui: let perf-team admins have access to xhgui DB (WIP)

Reason:
The password will be visible to all root users on the hosts anyways and perf-team members already have full root.

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608456

I think this ticket is done now.

Were database backups requested/needed?

The tasks description says:

Likewise, we probably do not need to back it up.

I think there was some minor debate on that point. We'll resolve it in our team meeting today and I'll come back with a definitive answer.

The consensus is that this database should be backed up if possible.

We've been able to track down performance regressions by comparing the profiler output of a page to profiles taken months (or even years) ago; we'd lose that capability if this data was lost. That said, we don't currently back up the MongoDB instance, so this is not totally critical.

Given the current total size of ~18 GB and write rate of ~1 GB/mo, I'm thinking this shouldn't add a huge cost.

Given the current total size of ~18 GB and write rate of ~1 GB/mo, I'm thinking this shouldn't add a huge cost.

I agree, which is why I asked- it takes little effort to be added, but it is not added automatically. Working on it.

Please edit description of ticket which used to say:

Likewise, we probably do not need to back it up.

Working on it.

Backup of the new db was setup, please allow 1 week to confirm they are running normally before closing the ticket.

A backups was successfully produced, however it is empty (has no data) like the database.

Thanks Jaime!
@dpifke anything else left from your side or we are good to close this?

All good on this end. Thanks!