Page MenuHomePhabricator

Heavy 19-hour quries on labsdb1005 (tools-db) by s51203 at s51203__baglama2_p
Closed, ResolvedPublic

Description

There has been a relatively recent spike on CPU and IO consumption on labsdb1005: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=22&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1005&from=1497348118878&to=1497952918879

The main reason for it seems to be very long queries taking over a lot of resources.

I think these are not intended because some indexes may most likely reduce both the resource consumption and the speed of such queries 1000x.

May I ask you to have a look at those:

labsdb1005	105156673	s51203	1	s51203__baglama2_p	19h
SELECT SUM(views) AS total_views FROM views,sites WHERE views.id IN (SELECT DISTINCT view_id FROM group2view WHERE group_status_id=21534) AND sites.id=site AND done=1 AND grok_code!='' AND namespace_id=0
labsdb1005	105627339	s51203	1	s51203__baglama2_p	12h
SELECT SUM(views) AS total_views FROM views,sites WHERE views.id IN (SELECT DISTINCT view_id FROM group2view WHERE group_status_id=20516) AND sites.id=site AND done=1 AND grok_code!='' AND namespace_id=0
labsdb1005	105415985	s51203	1	s51203__baglama2_p	5h
SELECT DISTINCT views.id AS id,title,namespace_id,grok_code,server,namespace_id,done FROM views,group2view,sites WHERE group_status_id=21552 AND done=0 AND view_id=views.id AND sites.id=site
labsdb1005	104939038	s51203	1	s51203__baglama2_p	4h
SELECT DISTINCT views.id AS id,title,namespace_id,grok_code,server,namespace_id,done FROM views,group2view,sites WHERE group_status_id=21535 AND done=0 AND view_id=views.id AND sites.id=site

There is currently no resource limitation on toolsdb because people have been very responsible in the past, so artificial limitation was not needed. This is affecting other tools users by making their tools slower, but I think a bit of structure or query changes, may make it faster, where all of us win- you queries are faster, other users can use more resources available, and I do not have to police them :-) Alternatively, it could be also as easy as avoiding the query by avoiding it use by a user of yours or introducing some timeout on edge cases.

Thank you,

Event Timeline

I have rewritten the query, should work better now I hope

jcrespo assigned this task to Magnus.

Indeed, the last long running query detected happened at 2017-06-21 08:51:16 UTC and now we have a large shrink in resources used. Thank you very much for the quick response and fix!