API query for template usage involving multiple titles and namespace filter times out
Closed, ResolvedPublic

Description

Author: carnildo

Description:
An API query on multiple titles for template usage with a namespace restriction gets an HTTP timeout.

Steps to reproduce:

Visit the URL http://en.wikipedia.org/w/api.php?action=query&format=xml&prop=templates&tlnamespace=10&titles=User:Carnildo/sandbox3|User:Carnildo/sandbox6

Removing either the "tlnamespace" term or either of the titles eliminates the error.

Expected result:

<?xml version="1.0" encoding="utf-8"?><api><query><pages><page pageid="9945938" ns="2" title="User:Carnildo/sandbox3" /><page pageid="16027416" ns="2" title="User:Carnildo/sandbox6"><templates><tl ns="10" title="Template:Administrators&#039; noticeboard navbox" /><tl ns="10" title="Template:Archiveline" /><tl ns="10" title="Template:Editabuselinks" /><tl ns="10" title="Template:Nowrap" /><tl ns="10" title="Template:Nowrap begin" /><tl ns="10" title="Template:Nowrap end" /><tl ns="10" title="Template:Purge" /><tl ns="10" title="Template:Shortcut" /><tl ns="10" title="Template:Tnavbar" /><tl ns="10" title="Template:•w" /><tl ns="10" title="Template:•wrap" /></templates></page></pages></query></api>

Actual result:

504 Gateway Timeout


Version: unspecified
Severity: normal

bzimport added a project: MediaWiki-API.Via ConduitNov 21 2014, 10:08 PM
bzimport set Reference to bz14102.
bzimport created this task.Via LegacyMay 13 2008, 2:42 AM
aaron added a comment.Via ConduitMay 13 2008, 4:00 AM

Looks like this should be fast, not sure why it is slow.

aaron added a comment.Via ConduitMay 13 2008, 6:10 AM

Maybe this is another insane index choice, it may be scanning the (namespace, title, from) index and filesorting...

aaron added a comment.Via ConduitMay 13 2008, 6:18 AM

Not having this doing the query on TS

Catrope added a comment.Via ConduitMay 13 2008, 10:41 AM

It's using the correct index, it just doesn't like it when you do:

SELECT stuff FROM templatelinks WHERE tl_title IN ('98', '107) AND tl_namespace=0 ORDER BY tl_from, tl_namespace, tl_title

It doesn't like the fact that you're sorting by tl_namespace while it's a constant in the WHERE clause. Dropping tl_namespace from the ORDER BY (it's pointless anyway) fixes that. The same happens for tl_title when there's only one title.

Fixed in r34720

aaron added a comment.Via ConduitMay 13 2008, 1:13 PM

(In reply to comment #4)

It's using the correct index, it just doesn't like it when you do:

SELECT stuff FROM templatelinks WHERE tl_title IN ('98', '107) AND
tl_namespace=0 ORDER BY tl_from, tl_namespace, tl_title

It doesn't like the fact that you're sorting by tl_namespace while it's a
constant in the WHERE clause. Dropping tl_namespace from the ORDER BY (it's
pointless anyway) fixes that. The same happens for tl_title when there's only
one title.

Fixed in r34720

If you say X = 'a' and sort by X,Y that shouldn't go that slow, unless there is a bug. Also, I did the two part query on TS myself, and it was fast (only like 9 results). If it was using the right index, it should be fast. The query took like an hour according to profiling, which strongly suggests wrong index usage. I hope the ORDER BY tweak avoids that perhaps.

aaron added a comment.Via ConduitMay 13 2008, 1:15 PM

"If you say X = 'a' and sort by X,Y that shouldn't go that slow". Of course, I meant to also say "in comparison to sorting by Y only".

Catrope added a comment.Via ConduitMay 13 2008, 1:30 PM

(In reply to comment #5)

If you say X = 'a' and sort by X,Y that shouldn't go that slow, unless there is
a bug. Also, I did the two part query on TS myself, and it was fast (only like
9 results). If it was using the right index, it should be fast. The query took
like an hour according to profiling, which strongly suggests wrong index usage.
I hope the ORDER BY tweak avoids that perhaps.

I know that. Anyway, MySQL behaves very stupidly and insists on filesorting when it encounters pl_namespace twice. According to my EXPLAINs, WHERE pl_namespace=0 ORDER BY pl_from, pl_namespace, pl_title filesorts, but omitting pl_namespace from the ORDER BY doesn't. The right index (pl_from, pl_namespace, pl_title) is automatically picked. If this query still has issues after the server sync, we probably need to force the index.

aaron added a comment.Via ConduitMay 13 2008, 1:37 PM

Even if all it was doing was using the pl_from part of the right index, doing a WHERE and a filesort, it should still be fast since it was only for those two pages, and they didn't seem to have any huge link lists. I'm still inclined that it was doing the wrong index in this case, as nothing else seems to explain the slowness.

Catrope added a comment.Via ConduitMay 13 2008, 1:39 PM

I'll add a FORCE INDEX just to be sure.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.