Create a way to sample wikireplicas usage data
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Bstorm
	Jan 22 2021, 5:48 PM

Description

After establishing that we definitely should not just turn on the general query log in mariadb (because of the infinitely climbing disk space that would consume), we decided we need a random sampling of queries to get a good idea of the usage patterns.

We'll need to create a service to log queries here and there, ideally only the queries we want (user queries vs. system stuff) and apply some metadata to them. Something that turns the general log on would need to run locally over the instances and then move the bulk logs somewhere, which seems very inefficient.

Using any user with PROCESS privs (possibly a new one since right now that's root and wmf-pt-kill), it would be possible to schedule regular queries of SHOW FULL PROCESSLIST; in order to get the full text of all queries running at that moment and then parse it from there.

It really should run across all three primary replicas on any given run, preserve the date, time and host information, as well as what type of user is running things (tool or regular user). We could also tally unique users on each run.

If we let that fill a database table, it can be analyzed later in a variety of ways. Really, this should be a VM with a large enough disk to house the amount of data we'd like to have.

Details

	Subject	Repo	Branch	Lines +/-
	wikireplicas: add a role for consistency on the querysampler service	operations/puppet	production	+7 -0
	wikireplicas: deploy a cloud-based query sampler for the replicas	operations/puppet	production	+260 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	• Marostegui	T233766 labsdb1011 mariadb crashed
		Restricted Task
		Restricted Task
		Unknown Object (Task)
Resolved	RobH	T260441 (Need By: ASAP) rack/setup/install clouddb10[13-20]
Open	None	T215858 Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema
Open	None	T280152 Mitigate breaking changes from the new Wiki Replicas architecture
Resolved	• Bstorm	T260389 Redesign and rebuild the wikireplicas service using a multi-instance architecture
Resolved	• Bstorm	T264254 Prepare Quarry for multiinstance wiki replicas
Declined	• Bstorm	T267989 Do some checks of how many Quarry queries will break in a multiinstance environment
Resolved	• Bstorm	T272723 Create a way to sample wikireplicas usage data

Event Timeline

• Bstorm triaged this task as Medium priority.Jan 22 2021, 5:48 PM

• Bstorm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 22 2021, 5:48 PM

@Marostegui do you have any concerns about the proposed service above if I create a new user with PROCESS privs to run it inside the clouddb-services project so it isn't open to the world which users are running what? Or for that matter, does anything I propose in the description sound like a terrible idea :)

@Bstorm it might be better to run: select * from information_schema.processlist;" as this is not a blocking action.
You'd need a user with a SELECT on information_schema.processlist GRANT.

Let me know if this is ok and if you need help setting this up.

Sounds like a good idea to me! I'll give that a try.

Ok, to get SELECT on information_schema.processlist, I basically needed to give the user GRANT PROCESS ON *.* TO 'querysampler'@'%';, because of the peculiar properties of information_schema. That is the only grant I gave the user, and that allows me to SELECT from information_schema.processlist and get what I believe to be the info I'm after. I created the user and grant only on labsdb10[09-12] because that's my scope at the moment.

I'm going to run the service/script on clouddb-wikireplicas-query-1.clouddb-services.eqiad1.wikimedia.cloud so its credentials aren't mingled with Toolforge in any way. It's just too "global" a priv to leave in that space, in my opinion.

I'll add the grant with comment to modules/role/templates/mariadb/grants/wiki-replicas.sql. Let me know if there's anything else I should do or change about that @Marostegui

@Bstorm that looks good to me, the only thing I would tackle would be the '%' and try to narrow that a little bit to a specific set of addresses if that is possible. If the grant will only run from clouddb-wikireplicas-query-1.clouddb-services.eqiad1.wikimedia.cloud maybe limit it to that IP. Let me know if you'd need help doing it. Happy to change it myself.

Thank you

Well, I'm actually accessing via the dbproxy servers via the full wikireplicas frontend, (just so it won't work to tie it based on that since the servers only see dbproxy IPs).

There's no public route.

Ah, excellent then!

Change 660960 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: deploy a cloud-based query sampler for the replicas

https://gerrit.wikimedia.org/r/660960

gerritbot added a project: Patch-For-Review.Feb 2 2021, 12:06 AM

Change 660960 merged by Bstorm:
[operations/puppet@production] wikireplicas: deploy a cloud-based query sampler for the replicas

https://gerrit.wikimedia.org/r/660960

Change 661464 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: add a role for consistency on the querysampler service

https://gerrit.wikimedia.org/r/661464

Change 661464 merged by Bstorm:
[operations/puppet@production] wikireplicas: add a role for consistency on the querysampler service

https://gerrit.wikimedia.org/r/661464

Maintenance_bot removed a project: Patch-For-Review.Feb 3 2021, 9:10 PM

The service is running. As long as it doesn't fill the disk too quickly, we should get a fair amount of data. I will keep this open for toying with and adjusting the puppetization and the service.

NOTE: This only samples the legacy replicas. It will need a small amount of love to read from the new ones if they see significant activity.

• Bstorm added parent tasks: T267989: Do some checks of how many Quarry queries will break in a multiinstance environment, T260389: Redesign and rebuild the wikireplicas service using a multi-instance architecture.Feb 12 2021, 12:06 AM

RhinosF1 subscribed.Mar 19 2021, 6:55 PM

JJMC89 moved this task from Backlog to Wiki replicas on the Data-Services board.Jul 23 2021, 4:20 AM

• nskaggs closed this task as Resolved.Jan 25 2022, 4:10 PM

Andrew mentioned this in T359810: Are clouddb-wikireplicas-query-1 and the cloudb-services project still useful?.Mar 21 2024, 7:47 PM