There is currently a fairly common use case for which we rely on raw SQL, despite having a pretty good abstraction layer in place.
$res = $db->select( # table(s) 'revision', # select field(s) [ 'rev_id', 'rev_user', ], # condition(s) [ 'rev_page' => $pageId, 'rev_timestamp > ' . $db->addQuotes( $db->timestamp( $since ) ) ], # descriptive name of caller for logging and debugging __METHOD__, [ 'ORDER BY' => 'rev_timestamp ASC', 'LIMIT' => 50 ] );
- Able to achieve the above without using raw SQL.
- As consumer, able to guarantee that my use of IDatabase cannot accidentally cause raw SQL to be used (e.g. I want an
"opt-in" or "opt-out" way to say that I promise to never use this insecure method and would expect failure instead of insecurity if something it is accidentally used in my call, no matter of hooks or indirect supply of parts of the conds array.)
Prior art and considerations
We currently have two pieces of a query builder. We have the outer structure of the query created via methods such as $db->select() and $db->insert(). For the inner segments, mainly for the conditions, we also have helper methods such as $db->makeList(), $db->buildConcat(), and $db->buildSelectSubquery().
The idea is to bring these together in an interface for query building, that different backends can implement based on the syntax for that particular RDBMS backend.
The Wikimedia\Rdbms\Subquery class and other encasing classes also resemble the direction of typed value objects in favour of strings, which this would build upon.
The proposed QueryInfo class (see https://gerrit.wikimedia.org/r/459242) also seems relevant here. It's orthogonal in so far that it could co-exist alongside the query builder, but may want to avoid having two very similar concepts of a "query" value object.
Proposal 1: Query builder
I think this may be a good opportunity to separate the query builder from the Database class, and move it to its own class. I envision it may look something like this:
$query = $db->newSelect(); $query-> /* fields ... from ..., where ... $query->buildGreaterThan( 'rev_timestamp', $sinceTimestamp ) etc */; $res = $db->query( $query );
Proposal 2: Alternate
A much simpler approach (in terms of implementation) would be to build on what we have. This would involve the following three changes, instead:
- Add the missing utility methods to IDatabase (buildGreatherThan, buildLessThan; most others we have already).
- Add some kind of flag to IDatabase::select()and others that would disable support for raw condition strings. For example, an optional first parameter like $db->select( $db::COND_SAFE, 'revision', [ .. ], [ .. ] );
- Change all utility methods to return encased value objects, similar to Wikimedia\Rdbms\LikeMatch, and in the selectSqlText method, throw if any of the condition are not formattable arrays and not encased objects.
Below is what that might look like:
$res = $db->select( $db::COND_SAFE, # name TBD (strict mode, safe mode, non-raw, ..) ..., [ $db->buildGreaterThan( 'rev_timestamp', $sinceTimestamp ) ], .. );
The downside is that this doesn't make for a good migration target.
The parameter will either always remain optional and rarely used, or if we do decide to deprecate non-safe mode, we'd have to first add it everywhere, and then either keep the pointless option forever, or remove it everywhere again.
One way to avoid that could be to use a different method name, something that doesn't look optional but is just different for no particular reason, e.g. $db->selectQuery() that wouldn't look weird in the long-term, but would however make for two confusingly similar methods until the migration is complete.
One way to avoid that, could be to have the parameters be encapsulated. E.g. we'd keep $db->select() and make the deprecation based on signature. E.g. the legacy signature is the current position arguments, and the new signature could be something like: $db->select( $db->makeSelectQuery( ... ) );. That is basically the query builder idea.