Page MenuHomePhabricator

Performance problem on database server s5 using commonswiki
Closed, ResolvedPublic

Description

Some queries are currently not runnable on s5.labsdb because of performance problems. Probably this is caused by federated tables.

Either these queries have not finished after hours and i kill them or i am getting errors caused by out of memory (lost connection, Table './mysql/proc' is marked as crashed)

I have many many complex queries in my scripts using commonswiki.
Here is one example query my bot runs every four hours:
It returns a list of files requested for deletions on commons with a link to deletion talk page section which are used on dewiki presentation namespace and the deletion request page was modified within the last two weeks by a human.

(the "used on dewiki" is not 100% correct because of performance reason i am using the big globalimagelinks in a second query, but that shouldn't matter here - only if sb. wants to copy the query for his own tools)

on WMDE toolserver

SELECT 'commonswiki', cp.page_id,

 CONCAT('[[:Datei:',REPLACE(cp.page_title,'_',' '),']]<small> [[:commons:Commons:Deletion requests/',REPLACE(SUBSTRING(cl_sortkey_prefix,LOCATE('#',cl_sortkey_prefix)+1),'_',' '),'|CDR]]</small>')
FROM commonswiki_p.templatelinks
 INNER JOIN commonswiki_p.page cp ON tl_from = cp.page_id
 INNER JOIN commonswiki_p.recentchanges ON rc_cur_id = cp.page_id AND rc_bot=0
 INNER JOIN dewiki_p.imagelinks ON il_to = cp.page_title
 INNER JOIN dewiki_p.page dp ON il_from = dp.page_id
 INNER JOIN commonswiki_p.categorylinks ON cp.page_id=cl_from AND cl_to LIKE 'Deletion_requests_%'
WHERE tl_namespace=10 AND tl_title='Delete' AND cp.page_namespace=6
 AND DATEDIFF(NOW(), rc_timestamp) < 14
 AND dp.page_namespace IN (0,6,10,14,100) AND cl_sortkey_prefix LIKE '%#%'
GROUP BY cp.page_id
ORDER BY NULL;

Result ON TS: 85 rows in set (0.25 sec)

Query for wmf Labs:

SELECT 'commonswiki', cp.page_id,

 CONCAT('[[:Datei:',REPLACE(cp.page_title,'_',' '),']]<small> [[:commons:Commons:Deletion requests/',REPLACE(SUBSTRING(cl_sortkey_prefix,LOCATE('#',cl_sortkey_prefix)+1),'_',' '),'|CDR]]</small>')
FROM commonswiki_f_p.templatelinks
 INNER JOIN commonswiki_f_p.page cp ON tl_from = cp.page_id
 INNER JOIN commonswiki_f_p.recentchanges ON rc_cur_id = cp.page_id AND rc_bot=0
 INNER JOIN dewiki_p.imagelinks ON il_to = cp.page_title
 INNER JOIN dewiki_p.page dp ON il_from = dp.page_id
 INNER JOIN commonswiki_f_p.categorylinks ON cp.page_id=cl_from AND cl_to LIKE 'Deletion_requests_%'
WHERE tl_namespace=10 AND tl_title='Delete' AND cp.page_namespace=6
 AND DATEDIFF(NOW(), rc_timestamp) < 14
 AND dp.page_namespace IN (0,6,10,14,100) AND cl_sortkey_prefix LIKE '%#%'
GROUP BY cp.page_id
ORDER BY NULL;

aborted by me after 3 hours.

Please optimize database performance for queries joining with commonswiki.


Version: unspecified
Severity: normal

Details

Reference
bz67602

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:39 AM
bzimport set Reference to bz67602.

That query is unlikely to be usable as-in with federation, at least not in its current form (using a federated table requires that the rows selected during a join are strictly constrained on indexed columns). In particular, the way the joins are structured will require several full table scans because the relations are not visible to federation, only the presence or absence of indices.

Arguably, doing this at the application level is the only correct solution as the load on the database would be orders of magnitude smaller and the queries much more manageable.

I will try to coordinate resources to help you adapt those bots as quickly as possible.

  • 1. row ******* id: 1 select_type: SIMPLE table: templatelinks type: ref

possible_keys: tl_from,tl_namespace,tl_backlinks_namespace

    key: tl_namespace
key_len: 261
    ref: const,const
   rows: 1
  Extra: Using where; Using index; Using temporary
  • 2. row ******* id: 1 select_type: SIMPLE table: categorylinks type: ref

possible_keys: cl_from,cl_timestamp,cl_sortkey

    key: cl_from
key_len: 4
    ref: dewiki.templatelinks.tl_from
   rows: 1
  Extra: Using where
  • 3. row ******* id: 1 select_type: SIMPLE table: cp type: eq_ref

possible_keys: PRIMARY,name_title

    key: PRIMARY
key_len: 4
    ref: dewiki.templatelinks.tl_from
   rows: 1
  Extra: Using where
  • 4. row ******* id: 1 select_type: SIMPLE table: recentchanges type: ref

possible_keys: rc_cur_id

    key: rc_cur_id
key_len: 4
    ref: dewiki.templatelinks.tl_from
   rows: 1
  Extra: Using where
  • 5. row ******* id: 1 select_type: SIMPLE table: imagelinks type: ref

possible_keys: il_from,il_to

    key: il_to
key_len: 257
    ref: dewiki.cp.page_title
   rows: 5
  Extra: Using index
  • 6. row ******* id: 1 select_type: SIMPLE table: dp type: eq_ref

possible_keys: PRIMARY,name_title

    key: PRIMARY
key_len: 4
    ref: dewiki.imagelinks.il_from
   rows: 1
  Extra: Using where

6 rows in set (0.01 sec)

I just divided another complex query into multiple queries for a performance test. This searches for included not existing images of a single article (there is only one image (Icon tools.svg) embedded in this article 8188489, so DF_temp has only one row):

CREATE TEMPORARY TABLE IF NOT EXISTS DF_temp (it VARCHAR(255) PRIMARY KEY);
DELETE FROM DF_temp;
INSERT IGNORE INTO DF_temp (it)
SELECT STRAIGHT_JOIN il_to

FROM dewiki_p.page p1
 INNER JOIN dewiki_p.imagelinks ON p1.page_id=il_from
 WHERE p1.page_id = 8188489;

DELETE DF_temp
FROM DF_temp

INNER JOIN dewiki_p.image ON it=img_name;

DELETE DF_temp
FROM DF_temp

INNER JOIN commonswiki_f_p.image ON it=img_name;

The last query fails after the timeout of 10 minutes it set on labs. img_name is primary key, so also your condition "rows selected during a join are strictly constrained on indexed columns" is true here.

There are also two additional queries checking for redirects to images. With these additional queries runtime on TS is about 0.08 seconds.

Ah, there's clearly something stupid in the way the query is planned, because:

select * FROM commonswiki_f_p.image where img_name in('Icon_tools.svg', 'Qsicon_Fokus2.svg');

instead of the third statement returns:

1 row in set (0.00 sec)

which is as expected, and the planner explains that when doing your third query it is specifically _not_ using a key(!)

Likewise:

select * FROM commonswiki_f_p.image where img_name in(select it from DF_temp);

times out (explain shows no index used). Where the problem lies might be in that the query planner somehow things there are no rows in the federated table. For that matter, even forcing the index fails to use it:

select img_bits FROM DF_temp inner JOIN commonswiki_f_p.image force index (primary) ON it=img_name;

also times out.

This is clearly incorrect behaviour; I'll be looking into it.

No further efforts trying to fix federation will be spent as we are upgrading to MariaDB 10 which will provide commons locally and do away with federation entirely.

The first database being transitioned is s5 (currently in progress), solving this for the problem at hand.

@Merl; how did the transition to MariaDB 10 go?

After the change to mariadb10 nearly all of my script always failed with one of two different errors:
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

I solved this by prepending all sql scripts with
SET SESSION tx_isolation='READ-UNCOMMITTED';

Sean also raised the lock timeout in between. I haven't checked if this makes my session based transaction isolation level change unnecessary.

All scripts/queries using joins with commonswiki are running successful now.
But i have not tested all scripts today. I'll make a complete report within the next two days.
There seems at least a problem with two scripts querying wikidata only (wikidatawiki_p.wb_terms). I am still investigating in this problem. These had been running on s5.labsdb with mariadb5 without problems.

Any news on this that isn't otherwise tracked with Sean regarding performance issues with the new database?

Some scripts are still getting a lock timeouts on first run, but nothing in compare with the initial reported problem here.