Page MenuHomePhabricator

deletion queries joined with tokudb replication tables are really slow
Closed, ResolvedPublic

Description

DELETION queries which are joining with replication databases for selection are really slow on mariadb 10.

example test Query:

CREATE TEMPORARY TABLE IF NOT EXISTS CCats (pid INT, cat VARCHAR(255), PRIMARY KEY (pid, cat), INDEX(cat)) ENGINE=Aria PAGE_CHECKSUM=0 TRANSACTIONAL=0;

INSERT IGNORE INTO CCats (pid, cat)
SELECT ip.page_id, cl_to

FROM commonswiki_p.page fp
 INNER JOIN commonswiki_p.templatelinks ON fp.page_id=tl_from
 INNER JOIN commonswiki_p.image ON TRIM(LEADING 'Featured_picture_candidates/File:' FROM tl_title) = img_name
 INNER JOIN commonswiki_p.page ip ON img_name = ip.page_title AND ip.page_namespace=6
 INNER JOIN commonswiki_p.categorylinks ON ip.page_id=cl_from
WHERE fp.page_namespace=4 AND fp.page_title = 'Featured_picture_candidates'
 AND tl_namespace=4 AND tl_title LIKE 'Featured_picture_candidates/File:%';

DELETE cc
FROM CCats cc

INNER JOIN commonswiki_p.page ON page_namespace=14 AND page_title = cat
INNER JOIN commonswiki_p.categorylinks ON cl_from = page_id

WHERE cl_to='Hidden_categories' AND page_namespace=14;

Execution time of this small query is about 30 seconds on labs, 0.32 seconds on old toolserver.

I already talked with sean about this problem on irc. There might be a patch available on september.
A more general tokudb bug is reported at https://github.com/Tokutek/tokudb-engine/issues/75 . Different to tokudb bug in my case toku db is only used for selection and no deletion is performed on these tables.


Version: unspecified
Severity: major
URL: https://github.com/Tokutek/tokudb-engine/issues/75

Details

Reference
bz68918

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:39 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz68918.

I'm fairly sure this is a manifestation of the same bug in the github report, which relates to TokuDB fractal indexes not (yet) using a bulk prefetching optimization by design in a number of situations, including DELETE.

To prove this, convert the example DELETE to an equivalent SELECT and see the runtime drop to <1s. That indicates to a possible work around using an additional temp table and a subquery:

CREATE TEMPORARY TABLE CCats2 AS SELECT * FROM CCats;

DELETE cc FROM CCats cc JOIN (

SELECT cc2.cat FROM CCats2 cc2
JOIN commonswiki_p.page ON page_title = cc2.cat
JOIN commonswiki_p.categorylinks ON cl_from = page_id
WHERE cl_to = 'Hidden_categories' AND page_namespace = 14

) t ON cc.cat = t.cat;

Time allowing, I'll test TokuDB 7.1.8 before the panned September GA.

Hey Sean,

what's the status on this? For Merl's migration from Toolserver this is really important. Without working delete queries, he'd have an enormous extra work to modify queries.

Hoping for good news!? Best Silke

MariaDB 10.0.14 was released late last week, and change log says the TokuDB fix for slow DELETE is included. Doing a test build today...

MariaDB 10.0.14 has TokuDB 7.5.0 [1] which now uses a bulk-fetch approach for delete [2][3]. The labs DBs have been upgraded.

Please test and report back?

[1] https://github.com/Tokutek/tokudb-engine/wiki/Release-Notes-for-TokuDB-7.5.0
[2] https://tokutek.atlassian.net/browse/DB-311
[3] https://tokutek.atlassian.net/browse/DB-500

Ok, cool, Sean! Thanks! Merl, have you tried this, yet?

Today i tried to run a script (missing articles) which has a lot of those deletion queries with many rows affected. It was not possible the run this script on labs before. After your update it is now possible to run these queries. So thx to Sean.

The last hours i spend a lot in rewriting these queries because they still were really slow (although possible). The problem was that there is no index on page_props.pp_value. Now i solved this problem.

It is really interesting that using the workaround i implemented today runs the query on labs about as fast as the old script version on toolserver, but running the new version on toolserver is three times slower than using the unindexed page_props.pp_value directly. Old version running on labs is about ten times slower.

So we have index differences too? Is the toolserver schema dumped anywhere I can see it?

Closing since the $topic is solved. Open another ticket if there is definitely another issue as per last couple comments.