Page MenuHomePhabricator

Decide how JOIN tables should be mentioned in IDatabase::select()
Open, LowPublic

Description

From the comments on I9c42c87c55: Currently, when using a JOIN in IDatabase::select(), the joined tables must also be mentioned in the first argument ($table). If a table is missing, the corresponding join is silently dropped.

$ phps
Psy Shell v0.10.4 (PHP 7.4.3 — cli) by Justin Hileman
>>> $dbr = wfGetDB( DB_REPLICA );
=> Wikimedia\Rdbms\MaintainableDBConnRef {#2823}
>>> $dbr->selectSQLText( [ 'a', 'b' ], [ 'x', 'y' ], [], __FUNCTION__, [], [ 'b' => [ 'JOIN', 'x = y' ] ] )
=> "SELECT  x,y  FROM `a` JOIN `b` ON ((x = y))    "
>>> $dbr->selectSQLText( [ 'a'      ], [ 'x', 'y' ], [], __FUNCTION__, [], [ 'b' => [ 'JOIN', 'x = y' ] ] )
=> "SELECT  x,y  FROM `a`     "

This seems like a great source of bugs and exploding database servers. In my opinion, it should behave in one of two ways:

  1. The joined table should only be mentioned in the $join_conds, removing the redundant mention in the $table. As @daniel put it in his comment, “supplying multiple tables in the first argument should be deprecated.”
  2. Any $join_conds table should be required in the $table as well, and the IDatabase should throw an error and refuse to operate if it’s missing. (Maybe it should first be a soft or hard deprecation warning for a certain period.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I left this task a bit open-ended, because I can see the appeal of both sides: on the one hand, removing the redundant mention and only selecting one main table name makes sense; on the other hand, I also see the appeal of having all the tables which the query uses mentioned in one block right at the top of the function call (since the $join_conds are the last argument).

(I updated the task description – it turns out the first argument is named $table, not $tables. Maybe that’s a hint that it really should be, well, one table, not multiple tables :) )

eprodromou added a subscriber: eprodromou.

This seems like an issue we should discuss in our tech planning meeting.

It's simpler to require the table to be in $table. The table name in $join_conds is really just the alias, i.e. the key in the $table array. There's no other place to put the actual table expression besides $table. If everyone just used SelectQueryBuilder, there would be no problem: it doesn't have this ambiguity.

Change 620303 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] rdbms: Throw if options reference unselected tables

https://gerrit.wikimedia.org/r/620303

Change 620320 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@master] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620320

This seems like a great source of bugs and exploding database servers.

For instance, if I’m not mistaken, Translate accidentally loads all revisions of a page when it wants to load the latest revision (fixed in I3b20e36002). Apparently this has not in fact caused the database servers to explode yet – possibly because the method in question (from a quick look through the source code) mainly seems to be used to load message pages, which I guess don’t have too many revisions most of the time.

Change 620320 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620320

Change 620669 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@REL1_34] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620669

Change 620670 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@REL1_35] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620670

Change 620669 merged by jenkins-bot:
[mediawiki/extensions/Translate@REL1_34] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620669

Change 620670 merged by jenkins-bot:
[mediawiki/extensions/Translate@REL1_35] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620670

Change 631171 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] LogPager+WikiPage: Only include used tables in $joinConds

https://gerrit.wikimedia.org/r/631171

The new SelectQueryBuilder (T243051) seems to have solved this problem as well: in that class, a joined table is only expected to be listed in the join conditions and not in the main table(s) list, if I understand correctly. In light of that, are we still interested in improving IDatabase::select(), or should we just close this task as declined?