Page MenuHomePhabricator

Create a way to sample wikireplicas usage data
Closed, ResolvedPublic

Description

After establishing that we definitely should not just turn on the general query log in mariadb (because of the infinitely climbing disk space that would consume), we decided we need a random sampling of queries to get a good idea of the usage patterns.

We'll need to create a service to log queries here and there, ideally only the queries we want (user queries vs. system stuff) and apply some metadata to them. Something that turns the general log on would need to run locally over the instances and then move the bulk logs somewhere, which seems very inefficient.

Using any user with PROCESS privs (possibly a new one since right now that's root and wmf-pt-kill), it would be possible to schedule regular queries of SHOW FULL PROCESSLIST; in order to get the full text of all queries running at that moment and then parse it from there.

It really should run across all three primary replicas on any given run, preserve the date, time and host information, as well as what type of user is running things (tool or regular user). We could also tally unique users on each run.

If we let that fill a database table, it can be analyzed later in a variety of ways. Really, this should be a VM with a large enough disk to house the amount of data we'd like to have.

Event Timeline

Bstorm triaged this task as Medium priority.Jan 22 2021, 5:48 PM
Bstorm created this task.
Bstorm added a subscriber: Marostegui.

@Marostegui do you have any concerns about the proposed service above if I create a new user with PROCESS privs to run it inside the clouddb-services project so it isn't open to the world which users are running what? Or for that matter, does anything I propose in the description sound like a terrible idea :)

@Bstorm it might be better to run: select * from information_schema.processlist;" as this is not a blocking action.
You'd need a user with a SELECT on information_schema.processlist GRANT.

Let me know if this is ok and if you need help setting this up.

Sounds like a good idea to me! I'll give that a try.

Ok, to get SELECT on information_schema.processlist, I basically needed to give the user GRANT PROCESS ON *.* TO 'querysampler'@'%';, because of the peculiar properties of information_schema. That is the only grant I gave the user, and that allows me to SELECT from information_schema.processlist and get what I believe to be the info I'm after. I created the user and grant only on labsdb10[09-12] because that's my scope at the moment.

I'm going to run the service/script on clouddb-wikireplicas-query-1.clouddb-services.eqiad1.wikimedia.cloud so its credentials aren't mingled with Toolforge in any way. It's just too "global" a priv to leave in that space, in my opinion.

I'll add the grant with comment to modules/role/templates/mariadb/grants/wiki-replicas.sql. Let me know if there's anything else I should do or change about that @Marostegui

@Bstorm that looks good to me, the only thing I would tackle would be the '%' and try to narrow that a little bit to a specific set of addresses if that is possible. If the grant will only run from clouddb-wikireplicas-query-1.clouddb-services.eqiad1.wikimedia.cloud maybe limit it to that IP. Let me know if you'd need help doing it. Happy to change it myself.

Thank you

Well, I'm actually accessing via the dbproxy servers via the full wikireplicas frontend, (just so it won't work to tie it based on that since the servers only see dbproxy IPs).

There's no public route.

Change 660960 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: deploy a cloud-based query sampler for the replicas

https://gerrit.wikimedia.org/r/660960

Change 660960 merged by Bstorm:
[operations/puppet@production] wikireplicas: deploy a cloud-based query sampler for the replicas

https://gerrit.wikimedia.org/r/660960

Change 661464 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: add a role for consistency on the querysampler service

https://gerrit.wikimedia.org/r/661464

Change 661464 merged by Bstorm:
[operations/puppet@production] wikireplicas: add a role for consistency on the querysampler service

https://gerrit.wikimedia.org/r/661464

The service is running. As long as it doesn't fill the disk too quickly, we should get a fair amount of data. I will keep this open for toying with and adjusting the puppetization and the service.

NOTE: This only samples the legacy replicas. It will need a small amount of love to read from the new ones if they see significant activity.