Investigate slow query logging/digest for Beta Cluster
Open, MediumPublic
Actions

Assigned To

None

Authored By

	• dduvall
	Oct 27 2015, 6:05 PM

Description

HHVM's SlowTimer already logs grossly slow queries to logstash, but we might catch more pre-deploy performance regressions with a slow query digest for Beta Cluster. The raw log would also be useful for analyzing BC outages post-mortem. (See T116447: [postmortem] Beta Cluster outage: deployment-db2 disk filled up, locked db replication)

If such a digest proves valuable, we should consider making its review a formal part of the MW train deployment process.

Related Objects
Search...

Status	Assigned	Task
Resolved	jcrespo	T99485 implement performance_schema for mysql monitoring
Open	None	T116793 Investigate slow query logging/digest for Beta Cluster
Resolved	jcrespo	T119461 Evaluate security concerns of logging beta cluster db queries on tendril

Event Timeline

• dduvall created this task.Oct 27 2015, 6:05 PM

• dduvall raised the priority of this task from to Needs Triage.

• dduvall updated the task description. (Show Details)

• dduvall added a project: Beta-Cluster-Infrastructure.

• dduvall added subscribers: • dduvall, • mmodell, hashar.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 27 2015, 6:05 PM

hashar added a project: DBA.Oct 28 2015, 10:31 AM

@jcrespo that is a follow up task after the beta cluster outage (T116447).

Dan mentioned the beta cluster databases do not log slow queries. We thought about enabling slow query logs on beta cluster and have them summarized somewhere so one can investigate potential slowness before they hit production.

HHVM does report some slow queries via SlowTimer, but Zend does not. Additionally if a query is killed HHVM is not going to report it.

So the whole purpose of this task is to set up a slow query analyzer on the beta cluster database and take in account its results when doing the deployment train.

Are you committing time to this?

We (RelEng) probably won't be able to commit any time to it right now --greg

I am asking because I can do what you tell me, or I can set up a better solution (the same we are deploying into production T99485).

Will poke @jcrespo about it, since we are in the same timezone it is more convenient.

hashar moved this task from To Triage to Next: Maintenance on the Beta-Cluster-Infrastructure board.Nov 2 2015, 8:13 PM

Poked Jaime about it by email.

Clarified with @jcrespo. We can just enable performance_schema just like for production (T99485). The informations will then be available in the beta cluster database instances (db1 / db2).

The instances are still Precise and hence come with MariaDB 5.5. performance_schema starts being useful with 5.6.

We will need a way to collect and send metrics to some central place. Production is apparently going to send metrics to Graphite so we can generate dashboards with Grafana.

jcrespo closed subtask T119461: Evaluate security concerns of logging beta cluster db queries on tendril as Resolved.Nov 24 2015, 6:08 PM

After reading T119461 and checking the number of warnings on production (https://gerrit.wikimedia.org/r/#/c/198661/) we should go with performance_schema, which will also solve logging warnings for T119371 at the same time.

jcrespo added a parent task: T99485: implement performance_schema for mysql monitoring.Nov 26 2015, 10:36 AM

greg moved this task from Next: Maintenance to Backlog on the Beta-Cluster-Infrastructure board.Aug 5 2016, 8:56 PM

Marostegui moved this task from Triage to Backlog on the DBA board.Dec 12 2018, 9:39 AM

Marostegui removed a project: DBA.Jul 17 2020, 5:31 AM

Investigate slow query logging/digest for Beta ClusterOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Investigate slow query logging/digest for Beta Cluster
Open, MediumPublic
Actions

Related Objects
Search...