Deprecate raw SQL conditions for IDatabase methods (select, insert, etc.)
Open, MediumPublic
Actions

Assigned To

Authored By

	Krinkle
	Nov 22 2018, 9:07 PM

Description

There is currently a fairly common use case for which we rely on raw SQL, despite having a pretty good abstraction layer in place.

For example:

$res = $db->newSelectQueryBuilder()
    ->select( [
        'rev_id',
        'rev_user',
    ] )
    ->from( 'revision' )
    ->where( [
        'rev_page' => $pageId,
        'rev_timestamp > ' . $db->addQuotes( $db->timestamp( $since ) )
    ] ),
    ->fetchResultSet();

Goals

Able to achieve the above without using raw SQL.
As consumer, able to guarantee that my use of IDatabase cannot accidentally cause raw SQL to be used (e.g. I want an "opt-in" or "opt-out" way to say that I promise to never use this insecure method and would expect failure instead of insecurity if something it is accidentally used in my call, no matter of hooks or indirect supply of parts of the conds array.)

Prior art and considerations

We currently have two pieces of a query builder. We have the outer structure of the query created via methods such as $db->select() and $db->insert(). For the inner segments, mainly for the conditions, we also have helper methods such as $db->makeList(), $db->buildConcat(), and $db->buildSelectSubquery().

The idea is to bring these together in an interface for query building, that different backends can implement based on the syntax for that particular RDBMS backend.

The Wikimedia\Rdbms\Subquery class and other encasing classes also resemble the direction of typed value objects in favour of strings, which this would build upon.

The proposed QueryInfo class (see https://gerrit.wikimedia.org/r/459242) also seems relevant here. It's orthogonal in so far that it could co-exist alongside the query builder, but may want to avoid having two very similar concepts of a "query" value object.

Proposal 1: Expression builder

We currently have builder methods like makeList and buildConcat directly on IDatabase. We could continue that model and add the missing ones for expresions where we still use raw SQL today (e.g.buildGreatherThan, buildLessThan, etc).

We coud also move them to the SelectQueryBuilder class or some new Expression class.

Either way, this could look something like this:

$db->newSelectQueryBuilder()
    /* -> … */ 
    ->where( [
        'rev_page' => $pageId,
        $expr->buildGreaterThan( 'rev_timestamp', $sinceTimestamp )
    ] )

Proposal 1B: Enforcement

The issue proposal 1 leaves unsolved is enforcement and confidence that nothing breaks out of the model.

One way to do this, is to have the expression building methods return an internal object instead of a string, and then methods like IDatabase::select and SelectQueryBuilder::where could deprecate use of raw SQL strings.

Possibly, we'd opt-in through some means like $db->select( $db::SAFE_MODE, 'revision', [ .. ], [ .. ] );

This would make it so that conditions have to be an array, or expressin object built by the above methods like Wikimedia\Rdbms\LikeMatch and other encasings.

Below is what that might look like:

$db->newSelectQueryBuilder( $db::SAFE_MODE )
    /* -> … */ 
    ->where( [
        'rev_page' => $pageId,
        $expr->buildGreaterThan( 'rev_timestamp', $sinceTimestamp )
    ] )

The downside is that this doesn't make for a good migration target.

The parameter will either always remain optional and rarely used, or if we do decide to deprecate non-safe mode, we'd have to first add it everywhere, and then either keep the pointless option forever, or remove it everywhere again.

One way to avoid that could be to use a different method name, something that doesn't look optional but is just different for no particular reason, e.g. $db->selectQuery() that wouldn't look weird in the long-term, but would however make for two confusingly similar methods until the migration is complete.

One way to avoid that, could be to have the parameters be encapsulated. E.g. we'd keep $db->select() and make the deprecation based on signature. E.g. the legacy signature is the current position arguments, and the new signature could be something like: $db->select( $db->makeSelectQuery( ... ) );. That is basically the query builder idea.

Details

Subject	Repo	Branch	Lines +/-
rdbms: Create RawSQLValue for SET clauses in update/upsert	mediawiki/core	master	+136 -43
rdbms: Introduce RawSQLExpression for edge cases	mediawiki/core	master	+51 -1
maintenance: Migrate to expression builders	mediawiki/core	master	+79 -71
rdbms: Add support for NOT LIKE in expression builder	mediawiki/core	master	+115 -35
rdbms: Add a strict regex on $field on expression builder	mediawiki/core	master	+8 -1
api: Migrate away from buildLike to expression builder	mediawiki/core	master	+88 -28
Migrate all non-API code to use expression builder instead of buildLike	mediawiki/core	master	+86 -23
maintenance: Migrate $db->buildLike() to expression builder	mediawiki/core	master	+88 -42
rdbms: Add support for LIKE in expression builder	mediawiki/core	master	+145 -68
Migrate away from $db->makeList in favor of expression builder	mediawiki/core	master	+46 -63
Migrate another batch to use $db->expr instead of raw SQL	mediawiki/core	master	+69 -64
Mass migrate simple cases to use expression builder	mediawiki/core	master	+99 -76
rdbms: Introduce expression builder	mediawiki/core	master	+477 -1
Introduce expression builder to avoid raw SQL	mediawiki/core	master	+242 -11

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Ladsgroup	T343098 [epic] Data Persistence Hypothesis WE 3.2.1
Open		Ladsgroup	T210206 Deprecate raw SQL conditions for IDatabase methods (select, insert, etc.)
Declined	Feature	None	T29646 Implement Database util method for BETWEEN operator
Resolved		aaron	T318845 Disallow passing raw subqueries to IDatabase::tableName
			Restricted Task
Resolved		Lucas_Werkmeister_WMDE	T332941 Warning: SQLPlatform::isWriteQuery fallback to regex (from Wikibase EntityUsageTable)
Resolved		Ladsgroup	T334661 rdbms: Find a way to use IDatabase::unionConditionPermutations without raw SQL
Resolved		Lucas_Werkmeister_WMDE	T333690 rdbms: Add way to issue UNION queries without raw SQL
Open		None	T369135 Deprecate and replace public methods that return raw SQL fragments as strings

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 970792 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Migrate away from $db->makeList in favor of expression builder

https://gerrit.wikimedia.org/r/970792

gerritbot added a project: Patch-For-Review.Nov 1 2023, 2:15 PM

Change 970868 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Add support for LIKE in expression builder

https://gerrit.wikimedia.org/r/970868

Change 970792 merged by jenkins-bot:

[mediawiki/core@master] Migrate away from $db->makeList in favor of expression builder

https://gerrit.wikimedia.org/r/970792

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.4; 2023-11-07); removed MW-1.42-notes (1.42.0-wmf.3; 2023-10-31).Nov 2 2023, 6:00 PM

Change 970868 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Add support for LIKE in expression builder

https://gerrit.wikimedia.org/r/970868

Maintenance_bot removed a project: Patch-For-Review.Nov 3 2023, 2:10 PM

Change 971951 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] maintenance: Migrate $db->buildLike() to expression builder

https://gerrit.wikimedia.org/r/971951

gerritbot added a project: Patch-For-Review.Nov 6 2023, 1:27 PM

Change 971951 merged by jenkins-bot:

[mediawiki/core@master] maintenance: Migrate $db->buildLike() to expression builder

https://gerrit.wikimedia.org/r/971951

Maintenance_bot removed a project: Patch-For-Review.Nov 6 2023, 3:11 PM

@daniel @matmarex @Krinkle the LIKE expression implementation gave me an idea for cases of ipb_range_end = ipb_range_start in WHERE conditions and possibly join conditions. We could introduce a RawValue class and turn ipb_range_end = ipb_range_start into $dbr->expr('ipb_range_end', '=', RawValue( 'ipb_range_start' ) ) instead. It's not pretty but it's clearly not a common usecase. Thoughts?

Basically flipping the default of "raw SQL unless specified" to "quoted unless explicitly specified"

Change 972431 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Migrate all non-API code to use expression builder instead of buildLike

https://gerrit.wikimedia.org/r/972431

gerritbot added a project: Patch-For-Review.Nov 7 2023, 5:31 PM

In T210206#9312960, @Ladsgroup wrote:

@daniel @matmarex @Krinkle the LIKE expression implementation gave me an idea for cases of ipb_range_end = ipb_range_start in WHERE conditions and possibly join conditions. We could introduce a RawValue class and turn ipb_range_end = ipb_range_start into $dbr->expr('ipb_range_end', '=', RawValue( 'ipb_range_start' ) ) instead. It's not pretty but it's clearly not a common usecase. Thoughts?

Basically flipping the default of "raw SQL unless specified" to "quoted unless explicitly specified"

I like the idea, except for the name. For the use case in this example, I'd suggest something like new TableName( 'ipb_range_start' ) - we would specifically declare the object to represent a table name. It should probablby trigger identifier quoting.

We could also support new RawCondition( "ipb_range_end = ipb_range_start" ).
Or, perhaps better: new JoinCondition( "ipb_range_end", "ipb_range_start" ).

Change 972431 merged by jenkins-bot:

[mediawiki/core@master] Migrate all non-API code to use expression builder instead of buildLike

https://gerrit.wikimedia.org/r/972431

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.5; 2023-11-14); removed MW-1.42-notes (1.42.0-wmf.4; 2023-11-07).Nov 7 2023, 11:00 PM

Maintenance_bot removed a project: Patch-For-Review.Nov 7 2023, 11:10 PM

Change 972850 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] api: Migrate away from buildLike to expression builder

https://gerrit.wikimedia.org/r/972850

gerritbot added a project: Patch-For-Review.Nov 8 2023, 3:38 PM

@Ladsgroup @daniel I'd be fine with any of those syntaxes. I'm not sure if it's worth doing for join conditions, though? They're almost always just 'a = b', maybe we can keep them as strings.

@Krinkle has also proposed something similar earlier:

In T210206#9226646, @Krinkle wrote:

->and( $db->expr( 'rev_timestamp', '<=', $db->expField( 'page_touched' ) )

I think the question is: do we want to allow more types for the third parameter to $db->expr(), or do we want to introduce a new IExpression subclass for this kind of condition? Given how you've implemented the LIKE expressions, I guess you prefer the former. That seems alright to me, as long as we don't do anything that is impossible for Phan to validate.

In T210206#9312960, @Ladsgroup wrote:

@daniel @matmarex @Krinkle the LIKE expression implementation gave me an idea for cases of ipb_range_end = ipb_range_start in WHERE conditions and possibly join conditions. We could introduce a RawValue class and turn ipb_range_end = ipb_range_start into $dbr->expr('ipb_range_end', '=', RawValue( 'ipb_range_start' ) ) instead. It's not pretty but it's clearly not a common usecase. Thoughts?

Basically flipping the default of "raw SQL unless specified" to "quoted unless explicitly specified"

Maybe we can just add a Column class, supported by Expression/$db->expr(), with identifier escaping/quoting used automatically when needed. The constructor would support:

new Column("column")
new Column( "alias.column" ) [not supporting a period in either of the two components]

It would be a little nice to also have an ExcludedColumn class for automatically dealing with "excluded.<column>" "ON CONFLICT"/upsert syntax in postgres/sqlite (which can't be done now due to raw sql).

I'd be careful with the wording of column vs field, though columns are also fields. Note that the rdbms Field interface already exists for other purposes, so that cannot exactly be used a class name without some renaming.

An interesting case is in SQLBagOStuff::buildMultiUpsertSetForOverwrite() that compares SUBSTR(column,x,y) with a literal. Maybe there could be a ComputedValue class, supported by Expression/$db->expr(). The constructor would take an an IComputedValue constant (e.g. IComputedValue::SUBSTR) plus the variadic int/string/Column/ComputedValue arguments. Maybe there could be a $db->computedValue() method, e.g.:

$db->computedValue( IComputedValue::SUBSTR, new Column( 'modtoken' ), 1, 13 );

I suppose ComputedValue could be shortened to Computation (though I like the clarity of LikeValue/ComputedValue in Expression construction).

aaron moved this task from Incoming to Radar on the API Platform board.Nov 9 2023, 4:32 PM

Maintenance_bot mentioned this in T350954: Use expression builder instead of raw SQL in GrowthExperiments.Nov 10 2023, 4:17 PM

Maintenance_bot mentioned this in T350955: Use expression builder instead of raw SQL in ReadingLists.

Maintenance_bot mentioned this in T350956: Use expression builder instead of raw SQL in Wikistories.

Maintenance_bot mentioned this in T350957: Use expression builder instead of raw SQL in IPInfo.

Maintenance_bot mentioned this in T350958: Use expression builder instead of raw SQL in TranslationNotifications.

Maintenance_bot mentioned this in T350959: Use expression builder instead of raw SQL in CampaignEvents.

Maintenance_bot mentioned this in T350960: Use expression builder instead of raw SQL in TemplateData.

Maintenance_bot mentioned this in T350961: Use expression builder instead of raw SQL in BounceHandler.

Maintenance_bot mentioned this in T350962: Use expression builder instead of raw SQL in intersection.

Maintenance_bot mentioned this in T350963: Use expression builder instead of raw SQL in Gadgets.

Maintenance_bot mentioned this in T350964: Use expression builder instead of raw SQL in CheckUser.Nov 10 2023, 4:20 PM

Maintenance_bot mentioned this in T350965: Use expression builder instead of raw SQL in CentralNotice.

Maintenance_bot mentioned this in T350966: Use expression builder instead of raw SQL in PageTriage.

Maintenance_bot mentioned this in T350967: Use expression builder instead of raw SQL in ContentTranslation.

Maintenance_bot mentioned this in T350968: Use expression builder instead of raw SQL in AbuseFilter.Nov 10 2023, 4:24 PM

Maintenance_bot mentioned this in T350969: Use expression builder instead of raw SQL in CentralAuth.

Maintenance_bot mentioned this in T350970: [COG] Use expression builder instead of raw SQL in Cognate.

Maintenance_bot mentioned this in T350971: Use expression builder instead of raw SQL in FlaggedRevs.

Maintenance_bot mentioned this in T350972: Use expression builder instead of raw SQL in GlobalBlocking.Nov 10 2023, 4:27 PM

Maintenance_bot mentioned this in T350973: Use expression builder instead of raw SQL in GlobalPreferences.

Maintenance_bot mentioned this in T350975: Use expression builder instead of raw SQL in GoogleNewsSitemap.Nov 10 2023, 4:31 PM

Maintenance_bot mentioned this in T350976: Use expression builder instead of raw SQL in ImageSuggestions.

Maintenance_bot mentioned this in T350977: Use expression builder instead of raw SQL in Linter.Nov 10 2023, 4:33 PM

Maintenance_bot mentioned this in T350979: Use expression builder instead of raw SQL in LiquidThreads.

Maintenance_bot mentioned this in T350980: Use expression builder instead of raw SQL in LoginNotify.Nov 10 2023, 4:35 PM

Maintenance_bot mentioned this in T350981: Use expression builder instead of raw SQL in MediaModeration.

Maintenance_bot mentioned this in T350982: Use expression builder instead of raw SQL in MobileFrontend.Nov 10 2023, 4:38 PM

Maintenance_bot mentioned this in T350983: Use expression builder instead of raw SQL in Newsletter.

Maintenance_bot mentioned this in T350984: Use expression builder instead of raw SQL in Nuke.Nov 10 2023, 4:40 PM

Maintenance_bot mentioned this in T350985: Use expression builder instead of raw SQL in OAuth.

Maintenance_bot mentioned this in T350986: Use expression builder instead of raw SQL in ORES.Nov 10 2023, 4:42 PM

Maintenance_bot mentioned this in T350987: Use expression builder instead of raw SQL in PageAssessments.

Maintenance_bot mentioned this in T350988: Use expression builder instead of raw SQL in PageImages.Nov 10 2023, 4:44 PM

Maintenance_bot mentioned this in T350989: Use expression builder instead of raw SQL in SecurePoll.

Maintenance_bot mentioned this in T350990: Use expression builder instead of raw SQL in SubPageList3.Nov 10 2023, 4:47 PM

Maintenance_bot mentioned this in T350991: Use expression builder instead of raw SQL in UrlShortener.

Maintenance_bot mentioned this in T350992: Use expression builder instead of raw SQL in VisualEditor.Nov 10 2023, 4:49 PM

Maintenance_bot mentioned this in T350993: Use expression builder instead of raw SQL in WikiLambda.

Maintenance_bot mentioned this in T350994: Use expression builder instead of raw SQL in WikimediaIncubator.Nov 10 2023, 4:52 PM

Ladsgroup mentioned this in T350999: Use expression builder instead of raw SQL in Wikibase.Nov 10 2023, 5:24 PM

In T210206#9313875, @daniel wrote:

In T210206#9312960, @Ladsgroup wrote:

@daniel @matmarex @Krinkle the LIKE expression implementation gave me an idea for cases of ipb_range_end = ipb_range_start in WHERE conditions and possibly join conditions. We could introduce a RawValue class and turn ipb_range_end = ipb_range_start into $dbr->expr('ipb_range_end', '=', RawValue( 'ipb_range_start' ) ) instead. It's not pretty but it's clearly not a common usecase. Thoughts?

Basically flipping the default of "raw SQL unless specified" to "quoted unless explicitly specified"

I like the idea, except for the name. For the use case in this example, I'd suggest something like new TableName( 'ipb_range_start' ) - we would specifically declare the object to represent a table name. It should probablby trigger identifier quoting.

We could also support new RawCondition( "ipb_range_end = ipb_range_start" ).
Or, perhaps better: new JoinCondition( "ipb_range_end", "ipb_range_start" ).

The problem is that it's not always join condition, in the example I gave above it is a WHERE condition (and not even ANSI-89 join, an actual condition to pick up rows that have the same value in two different fields).

I like the RawCondition( "ipb_range_end = ipb_range_start" ), maybe RawExpression to make it consistent?

In T210206#9317956, @matmarex wrote:
@Ladsgroup @daniel I'd be fine with any of those syntaxes. I'm not sure if it's worth doing for join conditions, though? They're almost always just 'a = b', maybe we can keep them as strings.

@Krinkle has also proposed something similar earlier:
In T210206#9226646, @Krinkle wrote:
->and( $db->expr( 'rev_timestamp', '<=', $db->expField( 'page_touched' ) )
I think the question is: do we want to allow more types for the third parameter to $db->expr(), or do we want to introduce a new IExpression subclass for this kind of condition? Given how you've implemented the LIKE expressions, I guess you prefer the former. That seems alright to me, as long as we don't do anything that is impossible for Phan to validate.

Yeah, I prefer the former but at the same time, I really don't want to introduce way too many new classes developers would need to learn and get used to, that's why I'm a bit against Column/Join, First I want to have RawSQL and then we can look at the usages and take out the most common ones into dedicated classes but to avoid having a class for every possible type of value that could exist. Does that make sense to you?

In T210206#9318067, @aaron wrote:
An interesting case is in SQLBagOStuff::buildMultiUpsertSetForOverwrite() that compares SUBSTR(column,x,y) with a literal. Maybe there could be a ComputedValue class, supported by Expression/$db->expr(). The constructor would take an an IComputedValue constant (e.g. IComputedValue::SUBSTR) plus the variadic int/string/Column/ComputedValue arguments. Maybe there could be a $db->computedValue() method, e.g.:
$db->computedValue( IComputedValue::SUBSTR, new Column( 'modtoken' ), 1, 13 );
I suppose ComputedValue could be shortened to Computation (though I like the clarity of LikeValue/ComputedValue in Expression construction).

I would really like to avoid coining new terms, we already have way too many terms that are not standard and each new developer needs learning and getting used to. it's not super clear either, does the php do the computation or it just produces sql and that happen db server side? Is it a computation in terms of halting or is it a transform? "function" is actually much clearer or "function call". I still would want to know how common this usecase is before jumping to implementing new interfaces.

Re-reading the description and comments it seems to me that this task is more about addQuotes(), phan-taint, and a more fluent looking interface more so than eliminating raw SQL from code calling rdbms query methods. Is the idea to keep most of the build*() methods around? Currently, the $field arguments to Database::expr() allows raw SQL in $field. Is that intentional in the long run? If so, then buildMultiUpsertSetForOverwrite() can just use SUBSTR() in the $field argument. SelectQueryBuilder::fields() allows raw SQL fields and SelectQueryBuilder::where() allows raw SQL. On the other hand, SelectQueryBuilder::table() wants a query builder for computed/derived tables rather than a string from selectSQLText() or such.

From the task description:

We coud also move them to the SelectQueryBuilder class or some new Expression class.

It terms of moving the build*() methods to SqlQueryBuilder, that seems like it would decrease convenience since the query builder would have to be bound to a variable instead of just reusing $db. Alternatively, using a new Expression class would be a lot like ComputedValue...maybe it could be called ExpressionValue and reuse IExpression:: constants. I could even be called FunctionCall, though it would be slightly odd for mere arithmetic operations. In any case, it would mean "value computed server side as specified by the query". It's unfortunate that both "calculated" and "computed" are used in SQL server documentation for this case, in addition to the related concept of schema-defined computed columns (both VIRTUAL and PERSISTENT).

I agree that it would be nice to cut down on classes that developers must be aware of. If the build*() methods go with a class based approach, this gets tricky. More so if raw SQL is not allowed in $fields/$conds since basic arithmetic operators also would need to use the class. If we are mostly fine with keeping the build*() methods, then new RawExpression("ipb_range_end = ipb_range_start") seems fine.

Change 972850 merged by jenkins-bot:

[mediawiki/core@master] api: Migrate away from buildLike to expression builder

https://gerrit.wikimedia.org/r/972850

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.7; 2023-11-28); removed MW-1.42-notes (1.42.0-wmf.5; 2023-11-14).Nov 17 2023, 8:00 PM

While $field can take raw SQL but it's different from status quo drastically, we don't take fields from user input so the chance of sql injection is much lower, on top of that, the plan is have strict regex on field (allowing only a-zA-Z\d\.) so it's different from the current wild west.

The plan is not to keep build*, we could have a replacement for them, e.g. buildSubString() -> subStringExpr() that would return IExpression.

Change 975915 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Add a strict regex on $field on expression builder

https://gerrit.wikimedia.org/r/975915

Change 975915 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Add a strict regex on $field on expression builder

https://gerrit.wikimedia.org/r/975915

Change 980493 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] rdbms: Add support for NOT LIKE in expression builder

https://gerrit.wikimedia.org/r/980493

Change 980493 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Add support for NOT LIKE in expression builder

https://gerrit.wikimedia.org/r/980493

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.9; 2023-12-12); removed MW-1.42-notes (1.42.0-wmf.7; 2023-11-28).Dec 6 2023, 7:00 PM

Novem_Linguae subscribed.Dec 30 2023, 6:27 PM

Change 991395 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] maintenance: Migrate to expression builders

https://gerrit.wikimedia.org/r/991395

Change 991395 merged by jenkins-bot:

[mediawiki/core@master] maintenance: Migrate to expression builders

https://gerrit.wikimedia.org/r/991395

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.15; 2024-01-23); removed MW-1.42-notes (1.42.0-wmf.9; 2023-12-12).Jan 17 2024, 11:00 PM

aaron closed subtask T318845: Disallow passing raw subqueries to IDatabase::tableName as Resolved.Feb 22 2024, 4:13 PM

Ladsgroup closed subtask T29646: Implement Database util method for BETWEEN operator as Declined.Mar 5 2024, 10:10 PM

Are there any updates on when it will be possible to pass a column name as the value in an IExpression? This will block T350972: Use expression builder instead of raw SQL in GlobalBlocking due to the usage of > between two columns.

There are several different suggestions, I support a broad API, called "RawSQLValue" class which means toSQL() wouldn't call addQuotes on it. Because we have many cases that we can't cover all. There are column comparisons, there are function calls, etc. etc. Of course its use should be used with care (at least it makes searching for potential SQL injection vectors easier).

If someone makes a decision to move forward with one of suggestions above, it's just matter of implementation.

Change #1014090 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Introduce RawSQLValue for edge cases

https://gerrit.wikimedia.org/r/1014090

Ladsgroup mentioned this in T361023: Migrate raw SQL building in conditions to expression builder in core.Mar 26 2024, 3:04 PM

Change #1014090 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Introduce RawSQLExpression for edge cases

https://gerrit.wikimedia.org/r/1014090

ReleaseTaggerBot edited projects, added MW-1.42-notes (1.42.0-wmf.26; 2024-04-09); removed MW-1.42-notes (1.42.0-wmf.15; 2024-01-23).Apr 3 2024, 3:02 PM

Change #1020368 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] rdbms: Create RawSQLValue for SET clauses in update/upsert

https://gerrit.wikimedia.org/r/1020368

Change #1020368 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Create RawSQLValue for SET clauses in update/upsert

https://gerrit.wikimedia.org/r/1020368

ReleaseTaggerBot added a project: MW-1.43-notes (1.43.0-wmf.8; 2024-06-04).May 29 2024, 12:00 PM

Maintenance_bot removed a project: Patch-For-Review.May 29 2024, 12:31 PM

So the raw SQL part has not been deprecated yet since there are many usecases left but the replacement in all cases is in place. I suggest renaming this ticket to narrow its scope to "provide replacement for raw SQL in conditions" and file a new ticket for the actual deprecation. Does that sound fine by people?

In T210206#9900598, @Ladsgroup wrote:

So the raw SQL part has not been deprecated yet since there are many usecases left but the replacement in all cases is in place. I suggest renaming this ticket to narrow its scope to "provide replacement for raw SQL in conditions" and file a new ticket for the actual deprecation. Does that sound fine by people?

Can we do it the other way around, preserving this as the deprecate task and have a new sub-task for the provide replacement which we can mark as Resolved?

In T210206#9900598, @Ladsgroup wrote:

So the raw SQL part has not been deprecated yet since there are many usecases left but the replacement in all cases is in place. I suggest renaming this ticket to narrow its scope to "provide replacement for raw SQL in conditions" and file a new ticket for the actual deprecation. Does that sound fine by people?

It sounds fine to me. I think that's how we actually used this ticket, for the most part (we tagged all patches introducing the replacements, but only a few random patches that made use of them).

matmarex added a subtask: T369135: Deprecate and replace public methods that return raw SQL fragments as strings.Jul 3 2024, 8:05 AM

Seeing these new IExpression in 1.42, it recalls me an article I read three years ago about prevention of SQLi, I mention it since it is a step further compared to the discussion here, but it remains a POC/theory. The article (in French) is here, written by a researcher then at Orange Cyberdefense (Judicaël Courant), and an equivalent conference paper in English is here; he wrote a POC in Java and I translated it into a PHP library three years ago.

The idea is to encode the SQL into PHP objects, creating an AST for the SQL query, which is then compiled into a prepared statement (*). According to the author, the advantages of this way of writing SQL are:

we keep the full expressivity of the SQL, and
it is no-SQLi-enforced by design (except if you bypass the literals, using id instead of str) so the reviews are facilitated, and
the "syntax" for constructing the AST is near standard SQL, so it should not be too difficult to learn.

An example for SELECT user_id FROM user WHERE user_name = '$user'; with my POC library (**):

use function SQLTrees\{select,from,where,operator,id,str};

$user = "SQLi' OR 1=1 OR '";
$sql = select( id( 'user_id' ),
               from( id( 'user' ) ),
               where( operator( id( 'user_name' ), '=', str( $user ) ) )
);

$statement = $sql->compile();
var_dump( $statement->getPreparedStatement() );

$conn = new mysqli( 'localhost', 'user', 'password', 'mediawiki' );
$res = $statement->run_mysqli( $conn );

with the var_dump displaying:

array(2) {
  'template' =>
  string(46) "SELECT user_id FROM user WHERE user_name = ?;"
  'parameters' =>
  array(1) {
    [0] =>
    array(2) {
      [0] =>
      string(1) "s"
      [1] =>
      string(17) "SQLi' OR 1=1 OR '"
    }
  }
}

(*) using prepared statements instead of writing directly the SQL statements is an implementation choice, mainly to reduce the complexity of the library by avoiding each DBMS-specific escaping rules
(**) I cheat a bit because, currently, "USER" is recognized as a reserved SQL identifier and an exception is thrown for this specific issue (indeed, it is standard-SQL recognized word, but it is not recognized as such by MySQL), I opened an issue on my library about this.

Atieno mentioned this in T374605: Migrate to UnionQueryBuilder in Extensions.Sep 12 2024, 10:34 AM

Atieno mentioned this in T374606: Migrate to SelectQueryBuilder::getSQL in misc 3rd-party extensions.Sep 12 2024, 10:43 AM

Deprecate raw SQL conditions for IDatabase methods (select, insert, etc.)Open, MediumPublicActions