There is currently a fairly common use case for which we rely on raw SQL, despite having a pretty good abstraction layer in place.
For example:
$res = $db->newSelectQueryBuilder() ->select( [ 'rev_id', 'rev_user', ] ) ->from( 'revision' ) ->where( [ 'rev_page' => $pageId, 'rev_timestamp > ' . $db->addQuotes( $db->timestamp( $since ) ) ] ), ->fetchResultSet();
Goals
- Able to achieve the above without using raw SQL.
- As consumer, able to guarantee that my use of IDatabase cannot accidentally cause raw SQL to be used (e.g. I want an "opt-in" or "opt-out" way to say that I promise to never use this insecure method and would expect failure instead of insecurity if something it is accidentally used in my call, no matter of hooks or indirect supply of parts of the conds array.)
Prior art and considerations
We currently have two pieces of a query builder. We have the outer structure of the query created via methods such as $db->select() and $db->insert(). For the inner segments, mainly for the conditions, we also have helper methods such as $db->makeList(), $db->buildConcat(), and $db->buildSelectSubquery().
The idea is to bring these together in an interface for query building, that different backends can implement based on the syntax for that particular RDBMS backend.
The Wikimedia\Rdbms\Subquery class and other encasing classes also resemble the direction of typed value objects in favour of strings, which this would build upon.
The proposed QueryInfo class (see https://gerrit.wikimedia.org/r/459242) also seems relevant here. It's orthogonal in so far that it could co-exist alongside the query builder, but may want to avoid having two very similar concepts of a "query" value object.
Proposal 1: Expression builder
We currently have builder methods like makeList and buildConcat directly on IDatabase. We could continue that model and add the missing ones for expresions where we still use raw SQL today (e.g.buildGreatherThan, buildLessThan, etc).
We coud also move them to the SelectQueryBuilder class or some new Expression class.
Either way, this could look something like this:
$db->newSelectQueryBuilder() /* -> … */ ->where( [ 'rev_page' => $pageId, $expr->buildGreaterThan( 'rev_timestamp', $sinceTimestamp ) ] )
Proposal 1B: Enforcement
The issue proposal 1 leaves unsolved is enforcement and confidence that nothing breaks out of the model.
One way to do this, is to have the expression building methods return an internal object instead of a string, and then methods like IDatabase::select and SelectQueryBuilder::where could deprecate use of raw SQL strings.
Possibly, we'd opt-in through some means like $db->select( $db::SAFE_MODE, 'revision', [ .. ], [ .. ] );
This would make it so that conditions have to be an array, or expressin object built by the above methods like Wikimedia\Rdbms\LikeMatch and other encasings.
Below is what that might look like:
$db->newSelectQueryBuilder( $db::SAFE_MODE ) /* -> … */ ->where( [ 'rev_page' => $pageId, $expr->buildGreaterThan( 'rev_timestamp', $sinceTimestamp ) ] )
The downside is that this doesn't make for a good migration target.
The parameter will either always remain optional and rarely used, or if we do decide to deprecate non-safe mode, we'd have to first add it everywhere, and then either keep the pointless option forever, or remove it everywhere again.
One way to avoid that could be to use a different method name, something that doesn't look optional but is just different for no particular reason, e.g. $db->selectQuery() that wouldn't look weird in the long-term, but would however make for two confusingly similar methods until the migration is complete.
One way to avoid that, could be to have the parameters be encapsulated. E.g. we'd keep $db->select() and make the deprecation based on signature. E.g. the legacy signature is the current position arguments, and the new signature could be something like: $db->select( $db->makeSelectQuery( ... ) );. That is basically the query builder idea.