Page MenuHomePhabricator

Allow subqueries in SQL JOINs with abstraction layer
Closed, ResolvedPublic

Description

It would be handy to be able to perform an INNER JOIN with a subquery rather than sending multiple queries. I asked about this on IRC and got some helpful workarounds which will work for my case, but it was suggested to open a report, if only as a request for comments.

My use-case was this: I was trying to find the greatest revision ID less than another revision ID for all pages, fetching both from the archive, page, and revision tables with a UNION. To do this I originally used the following SQL code (just the archive table part):

SELECT a.ar_id, a.ar_page_id AS page_id, a.ar_namespace AS page_namespace, a.ar_rev_id AS rev_id
FROM archive AS a
INNER JOIN (
    SELECT ar_page_id AS page_id, MAX(ar_rev_id) AS rev_id
    FROM archive
    WHERE ar_rev_id <= 24
    GROUP BY ar_page_id
) b ON a.ar_page_id = b.page_id AND a.ar_rev_id = b.rev_id
WHERE ar_namespace = 0

Essentially, this is the only way to get the value for the ar_id field along with the others without including it in the GROUP BY, which wouldn't make any sense. However, since I didn't actually need the ar_id field, except to know that the row was from the archive table, I can use this simplified version that bawolff offered on IRC:

SELECT 'true' AS is_archive,
       ar_page_id AS page_id,
       ar_namespace AS page_namespace,
       MAX(ar_rev_id) AS rev_id
FROM archive
WHERE ar_rev_id <= 24
      AND ar_namespace = 0
GROUP BY 'true',
         ar_page_id,
         ar_namespace

(Note that restrictions on ar_namespace and ar_rev_id are merely examples.)

Needless to say, this is a much simpler solution, and it can be created with the abstraction layer. However, supposing one wanted to get the value of any field that might be unique for each row? In this case, it is impossible without a subquery.

Skizzerz suggested using the subquery as a table with a table alias, though as noted it's a bit of hack and rather ugly.

Event Timeline

This has been implemented in Feb 2018 (released in MediaWiki 1.31) with d395dfb039f / https://gerrit.wikimedia.org/r/410632) .

See the IDatabase::buildSelectSubquery() method (and the internal Subquery class it uses), which allows one to give it a name and also join against it.

Krinkle assigned this task to aaron.
Krinkle triaged this task as Medium priority.
Krinkle added a project: Performance-Team.
Krinkle moved this task from Schema changes to Rdbms library on the MediaWiki-libs-Rdbms board.