Page MenuHomePhabricator

Add IDatabase::updateJoin()
Closed, DeclinedPublic

Description

Sometimes you want to update a table based on the contents of another table. It's probably best done by creating a IDatabase::updateJoin() method similar to the existing IDatabase::deleteJoin().

-- Generic
UPDATE foo SET foo_field1 = 1, foo_field2 = (SELECT bar_field2 FROM bar WHERE foo_id = bar_id AND bar_field1 = 'value') WHERE foo_id IN (SELECT bar_id FROM bar WHERE bar_field1 = 'value');
UPDATE foo SET foo_field1 = 1, foo_field2 = (SELECT bar_field2 FROM bar WHERE foo_id = bar_id AND bar_field1 = 'value') WHERE EXISTS (SELECT 1 FROM bar WHERE foo_id = bar_id AND bar_field1 = 'value');
-- Or do multiple queries, like
SELECT foo_id, 1 AS foo_field1, bar_field2 AS foo_field2 FROM foo JOIN bar ON(foo_id = bar_id) WHERE bar_field1 = 'value';
UPDATE foo SET foo_field1 = ?, foo_field2 = ? WHERE foo_id = ?; -- for each row from the SELECT

-- MySQL: https://dev.mysql.com/doc/refman/5.6/en/update.html
UPDATE foo JOIN bar ON (foo_id = bar_id) SET foo_field1 = 1, foo_field2 = bar_field2 WHERE bar_field1 = 'value';

-- PostgreSQL: https://www.postgresql.org/docs/current/sql-update.html
-- Note that PostgreSQL can only do an [INNER] JOIN between the table being updated and the rest of the tables. That's probably an ok limitation on the method.
UPDATE foo SET (foo_field1, foo_field2) = (SELECT 1, bar_field2 FROM bar WHERE foo_id = bar_id AND bar_field1 = 'value';
UPDATE foo SET foo_field1 = 1, foo_field2 = bar_field2 FROM bar WHERE foo_id = bar_id AND bar_field1 = 'value';

-- MSSQL: https://docs.microsoft.com/en-us/sql/t-sql/queries/update-transact-sql?view=sql-server-2017
UPDATE foo SET foo_field1 = 1, foo_field2 = bar_field2 FROM foo JOIN bar ON (foo_id = bar_id) WHERE bar_field1 = 'value';

Note that, if the join produces multiple rows for a foo row,

  • The MySQL, MSSQL, and the second PostgreSQL examples will pick one arbitrary row to base the update on.
  • The first PostgreSQL example will raise an error.
  • The first two generic examples will probably raise an error as written (complaining that the SET subselect returned multiple rows), but without the subselect in the SET clause they'd pick one arbitrary row.
  • The last generic example will probably update the foo row multiple times, although that would be easy enough to avoid.

Event Timeline

Krinkle triaged this task as Medium priority.Jul 23 2019, 5:46 PM
Krinkle lowered the priority of this task from Medium to Low.Jun 30 2021, 6:17 PM
Krinkle subscribed.

I'm declining this as there hasn't been interest of late, and it appears to not mention an immediate use case. From a quick glance, I believe this would not be an enabling feature:

  • Such queries can generally already be done today as separate queries instead. The main reason against this I believe would be atomicity, however we already provide (and do automatically for you) transaction management, and also offer atomicity helpers.
  • By being in separate queries, I think we generally encourage better query planning, with less chances of locks or overly complicated queries that might not scale as well.
  • Such queries seem likely to violate Database best practices around write queries being deterministic. E.g. we prefer first selecting by primary ID and then performing an explicit update.