Page MenuHomePhabricator

Decide how JOIN tables should be mentioned in IDatabase::select()
Closed, DeclinedPublic

Description

From the comments on I9c42c87c55: Currently, when using a JOIN in IDatabase::select(), the joined tables must also be mentioned in the first argument ($table). If a table is missing, the corresponding join is silently dropped.

$ phps
Psy Shell v0.10.4 (PHP 7.4.3 — cli) by Justin Hileman
>>> $dbr = wfGetDB( DB_REPLICA );
=> Wikimedia\Rdbms\MaintainableDBConnRef {#2823}
>>> $dbr->selectSQLText( [ 'a', 'b' ], [ 'x', 'y' ], [], __FUNCTION__, [], [ 'b' => [ 'JOIN', 'x = y' ] ] )
=> "SELECT  x,y  FROM `a` JOIN `b` ON ((x = y))    "
>>> $dbr->selectSQLText( [ 'a'      ], [ 'x', 'y' ], [], __FUNCTION__, [], [ 'b' => [ 'JOIN', 'x = y' ] ] )
=> "SELECT  x,y  FROM `a`     "

This seems like a great source of bugs and exploding database servers. In my opinion, it should behave in one of two ways:

  1. The joined table should only be mentioned in the $join_conds, removing the redundant mention in the $table. As @daniel put it in his comment, “supplying multiple tables in the first argument should be deprecated.”
  2. Any $join_conds table should be required in the $table as well, and the IDatabase should throw an error and refuse to operate if it’s missing. (Maybe it should first be a soft or hard deprecation warning for a certain period.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I left this task a bit open-ended, because I can see the appeal of both sides: on the one hand, removing the redundant mention and only selecting one main table name makes sense; on the other hand, I also see the appeal of having all the tables which the query uses mentioned in one block right at the top of the function call (since the $join_conds are the last argument).

(I updated the task description – it turns out the first argument is named $table, not $tables. Maybe that’s a hint that it really should be, well, one table, not multiple tables :) )

eprodromou subscribed.

This seems like an issue we should discuss in our tech planning meeting.

It's simpler to require the table to be in $table. The table name in $join_conds is really just the alias, i.e. the key in the $table array. There's no other place to put the actual table expression besides $table. If everyone just used SelectQueryBuilder, there would be no problem: it doesn't have this ambiguity.

Change 620303 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] rdbms: Throw if options reference unselected tables

https://gerrit.wikimedia.org/r/620303

Change 620320 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@master] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620320

This seems like a great source of bugs and exploding database servers.

For instance, if I’m not mistaken, Translate accidentally loads all revisions of a page when it wants to load the latest revision (fixed in I3b20e36002). Apparently this has not in fact caused the database servers to explode yet – possibly because the method in question (from a quick look through the source code) mainly seems to be used to load message pages, which I guess don’t have too many revisions most of the time.

Change 620320 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620320

Change 620669 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@REL1_34] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620669

Change 620670 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Translate@REL1_35] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620670

Change 620669 merged by jenkins-bot:
[mediawiki/extensions/Translate@REL1_34] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620669

Change 620670 merged by jenkins-bot:
[mediawiki/extensions/Translate@REL1_35] Fix IDatabase::select() JOIN condition

https://gerrit.wikimedia.org/r/620670

Change 631171 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] LogPager+WikiPage: Only include used tables in $joinConds

https://gerrit.wikimedia.org/r/631171

The new SelectQueryBuilder (T243051) seems to have solved this problem as well: in that class, a joined table is only expected to be listed in the join conditions and not in the main table(s) list, if I understand correctly. In light of that, are we still interested in improving IDatabase::select(), or should we just close this task as declined?

The new SelectQueryBuilder (T243051) seems to have solved this problem as well: in that class, a joined table is only expected to be listed in the join conditions and not in the main table(s) list, if I understand correctly. In light of that, are we still interested in improving IDatabase::select(), or should we just close this task as declined?

I suggest declining this ticket. We have more than enough issues with the current system and needs to be migrated and eventually deprecated. Improving a code that eventually be removed seems counter-intuitive.

I would feel more comfortable with that suggestion if there was any indication that this migration to SelectQueryBuilder will actually happen, though. Right now the relevant documentation doesn’t mention the new class, and the IDatabase::select() phpdoc also doesn’t have any indication that you might want to use something else.

That is a valid point. Let me just fix the documentation.

Change 810856 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Soft-deprecate IDatabase::select()

https://gerrit.wikimedia.org/r/810856

Alright, thanks; I’ve also added something to Manual:Database access. (It could still be expanded – for example, the specific query builder classes like PageSelectQueryBuilder and UserSelectQueryBuilder are probably worth mentioning, but I’m not very familiar with them.)

Change 811936 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Add a note encouraging use of SelectQueryBuilder in IDatabase::select

https://gerrit.wikimedia.org/r/811936

Change 811936 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Add a note encouraging use of SelectQueryBuilder in IDatabase::select

https://gerrit.wikimedia.org/r/811936

I think that’s good enough to decline this task now.

Change 620303 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] rdbms: Throw if options reference unselected tables

Reason:

task declined, IDatabase::select() will be deprecated in favor of SelectQueryBuilder (where join() and leftJoin() have a better signature)

https://gerrit.wikimedia.org/r/620303

Change 631171 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/core@master] LogPager+WikiPage: Only include used tables in $joinConds

Reason:

https://gerrit.wikimedia.org/r/631171

Change 810856 abandoned by Ladsgroup:

[mediawiki/core@master] rdbms: Soft-deprecate IDatabase::select() and similar methods

Reason:

These methods are now @internal making this patch moot

https://gerrit.wikimedia.org/r/810856