Page MenuHomePhabricator

Create a utility that lists all databases of a user
Closed, ResolvedPublic

Description

I would like a utility available for Labs users (and admins) that lists all database objects that a (Unix) user may have on tools-db or (one of) the replica servers.

This is useful for admins when deleting a tool to see where data needs to be deleted.

Event Timeline

scfc raised the priority of this task from to Lowest.
scfc updated the task description. (Show Details)
scfc added a project: Toolforge.
scfc moved this task to Ready to be worked on on the Toolforge board.
scfc added subscribers: scfc, yuvipanda, coren, Springle.

First stab:

#!/usr/bin/perl -w

use strict;
use warnings;

use Data::Dumper;
use DBI;

my @servers = ('10.64.37.4', '10.64.37.5', '10.64.37.9', '10.64.4.11');
my %users = ('s12345' => 'password');

foreach my $user (keys (%users)) {
    foreach my $server (@servers) {
        print Dumper $user, $server;
        my $db = DBI->connect ("DBI:mysql:database=mysql;host=$server;mysql_read_default_group=client", $user, $users {$user}) or die "error: $DBI::errstr";
        my $s = $db->prepare ('SHOW GRANTS;') or die ($db->errstr ());
        $s->execute () or die ($db->errstr ());
        my @databases;
        while (my $r = $s->fetchrow_arrayref ()) {
            my $GrantStatement = $r->[0];
            if ($GrantStatement eq 'GRANT SELECT, SHOW VIEW ON `%_p`.* TO \'' . $user . '\'@\'%\'') {
                next;
            } elsif ($GrantStatement =~ /^GRANT SHOW VIEW ON *.* TO '\Q$user\E'\@'%' IDENTIFIED BY PASSWORD '\*[0-9A-F]{40}'$/) {
                next;
            } elsif ($GrantStatement =~ /^GRANT ALL PRIVILEGES ON `([^`]+)`.* TO '\Q$user\E'\@'%' WITH GRANT OPTION$/) {
                push (@databases, $1);
            } else {
                die ("Can't parse $GrantStatement");
            }
        }
        # print Dumper \@databases;
        $s = $db->prepare ('SHOW DATABASES LIKE ?;') or die ($db->errstr ());
        foreach my $pattern (@databases) {
            $s->execute ($pattern) or die ($db->errstr ());
            while (my $r = $s->fetchrow_arrayref ()) {
                print Dumper $r;
            }
        }
        # die;
    }
}

Worked AFAICS.

Let's not use perl please. I'd rather not have any more perl in our ops/puppet repo than we already do (And I think @mark agrees).

Let's not use perl please. I'd rather not have any more perl in our ops/puppet repo than we already do (And I think @mark agrees).

If I could upvote this comment I would.

Let's not use perl please. I'd rather not have any more perl in our ops/puppet repo than we already do (And I think @mark agrees).

That's not meant to be added to that repository. I just wanted to get T91740 resolved, and as no Pythonists stood up to the challenge in the past fortnight, I had to do that myself :-). It's more of a proof-of-concept, and because I don't know MySQL by heart, I'm interested if besides the SHOW GRANTS approach there are other ways to find out if a user has created databases/tables. I'm totally impartial to the way the SQL commands are fed to the database servers.

:) cool! I'll see if I can port this to python and put it in the package at some point - think this will be useful in the future as well.

Eh, the translation isn't that complicated so that I couldn't do that myself :-). But that only makes sense after the proper way to solve the problem has been determined.

IDK how unix users map to db users, but user databases are all prefixed by the db-user name (which is enforced by that third grant statement you're parsing) so I expect your SHOW DATABASES statement will always end up as:

show databases like '${user}\_%';

Optionally use information_schema to avoid logging in as the user:

select schema_name from information_schema.schemata where schema_name regexp '^${user}_');

Users on replicas also have GRANT OPTION which I expect means someone will inevitably have done something custom that will confuse your script and hit the die() block. I havn't checked the situation on tools-db. Ultimately you may not get anything more accurate or reliable than these simple queries.

IDK how unix users map to db users [...]

FWIW, the mapping is straightforward: non-service-group users are mapped to uX and service group users are mapped to sX where X is the numerical user ID.

That said, there remains a handful of databases that date from the much older period when the naming scheme was different because user IDs weren't guaranteed to be unique accross project and those are pXgY where X is the /project's/ group ID and Y is the service group ID. All of those are reasonably mappable to unambiguous regular expressions.

The legacy databases are the ones I was concerned about and caused me to not take any short cuts; the "something custom" bit reinforced that now :-). My initial motivation was to find out if tool X has not stored anything on a database server and can be deleted without remorse; so false positives are much less of a problem than false negatives.

I'll mold the first part (list the database objects) into a Python script and submit that; after that I'll take a look at the dumping part (probably just need to spawn mysqldumps).

Change 234934 had a related patch set uploaded (by Tim Landscheidt):
Add list-user-databases command

https://gerrit.wikimedia.org/r/234934

scfc renamed this task from Create a utility that dumps all databases of a user to Create a utility that lists all databases of a user.Feb 1 2016, 4:11 AM
scfc updated the task description. (Show Details)
scfc set Security to None.

Narrowed the scope down to listing all databases of a user; this is my real-life use case.

Change 234934 merged by Tim Landscheidt:
Add list-user-databases command

https://gerrit.wikimedia.org/r/234934

scfc claimed this task.
scfc removed a project: Patch-For-Review.