Page MenuHomePhabricator

Allow use of prepared SQL statements in {{#get_db_data:}}
Closed, ResolvedPublic

Description

Allow use of prepared SQL statements in {{#get_db_data:}}.

Define prepared SQL statements in LocalSettings.php per database connection.

Examples:

  • One statement per connection:

In LocalSettings.php:

$edgDBServer	['rfam']	= 'mysql-rfam-public.ebi.ac.uk:4497';
$edgDBServerType['rfam']	= 'mysql';
$edgDBName	['rfam']	= 'Rfam';
$edgDBUser	['rfam']	= 'rfamro';
$edgDBPass	['rfam']	= '';
$edgDBPrepared	['rfam']	= <<<'SQL'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end
FROM full_region fr, rfamseq rf, taxonomy tx
WHERE rf.ncbi_id = tx.ncbi_id
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.ncbi_id = ?
AND is_significant = 1 -- exclude low-scoring matches from the same clan
SQL;

In wikitext:

{{#get_db_data:
db = rfam
| from=whatever <!-- this parameter is ignored -->
| where=1=10116 <!-- this parameter is used to substitute question marks in prepared statement -->
| data=account=rfam_acc,sec=rfamseq_acc,start=seq_start,end=seq_end
}}
  • Several statements per connection:

In LocalSettings.php:

$edgDBServer	['rfam']	= 'mysql-rfam-public.ebi.ac.uk:4497';
$edgDBServerType['rfam']	= 'mysql';
$edgDBName	['rfam']	= 'Rfam';
$edgDBUser	['rfam']	= 'rfamro';
$edgDBPass	['rfam']	= '';
$edgDBPrepared	['rfam']	= [
	'sequences' => <<<'SEQ'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end
FROM full_region fr, rfamseq rf, taxonomy tx
WHERE rf.ncbi_id = tx.ncbi_id
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.ncbi_id = ?
AND is_significant = 1 -- exclude low-scoring matches from the same clan
SEQ,
	'sno' => <<<'SNO'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end, f.type
FROM full_region fr, rfamseq rf, taxonomy tx, family f
WHERE
rf.ncbi_id = tx.ncbi_id
AND f.rfam_acc = fr.rfam_acc
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.tax_string LIKE ?
AND f.type LIKE '%snoRNA%'
AND is_significant = 1 -- exclude low-scoring matches from the same clan
SNO];

In wikitext:

{{#get_db_data:
db = rfam
| from=sequences <!-- this parameter is used to choose one of the prepared statements -->
| where=1=10116 <!-- this parameter is used to substitute question marks in prepared statement -->
| data=account=rfam_acc,sec=rfamseq_acc,start=seq_start,end=seq_end
}}

If prepared SQL statements are defined for a database connection, arbitrary SQL queries are effectively disallowed.

Event Timeline

Sorry for the delay. I don't understand the last thing you wrote: "arbitrary SQL queries are effectively disallowed". Would this change remove support for the standard #get_db_data approach?

Also - can you explain why you want this change?

Sorry for the delay. I don't understand the last thing you wrote: "arbitrary SQL queries are effectively disallowed". Would this change remove support for the standard #get_db_data approach?

If LocalSettings.php defines one or more prepared statements for a database connection, then for that connection only those prepared statements can be invoked by {{#get_db_data:}}, and any other queries will not be allowed for them. For such connections, the task of writing SQL queries is transferred to the MediaWiki server administrator.

Also - can you explain why you want this change?

This approach will be safer. Even if we assume that injections have been successfully prevented (and I wouldn't guarantee that), there is still the question of exposing sensitive data from the database, limited only by rights granted to the database user.

Using prepared statements is best practice; even in simple cases when only selection criteria from the right parts of predicates in WHERE depend on user input; let alone composing the very structure of an SQL query from it.

Even performance of prepared statements is better when there are several similar queries.

Oh, I see - for any DB connection, it would be one or the other.

What if the admin put in something like "DELETE * FROM Table" into the prepared statement - would the code check against that? Or is it assumed that an admin would not sabotage his own system?

Also, would these "prepared statements" actually be stored as prepared statements within the database? If so - at what point would that happen?

What if the admin put in something like "DELETE * FROM Table" into the prepared statement - would the code check against that? Or is it assumed that an admin would not sabotage his own system?

Yes, this is the assumption. Anyone with access to PHP code and database credentials can send a DROP query, even without ExternalData extension.

Also, would these "prepared statements" actually be stored as prepared statements within the database? If so - at what point would that happen?

A prepared statement is a PHP resource. Its lifetime, like any PHP variable, is HTTP request, or less, unless it is cached with APCU. Therefore, a prepared statement is likely to be compiled in EDConnector*::__construct() and possibly cached.

Okay, I can see how this would be useful. Still, I think it makes sense to do at least some basic validation on the text of a prepared statement, like making sure that it starts with "SELECT", and prohibiting semicolons and/or comments.

Change 714192 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Allow use of prepared SQL statements in {{#get_db_data:}}

https://gerrit.wikimedia.org/r/714192

Change 714192 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Allow use of prepared SQL statements in {{#get_db_data:}}

https://gerrit.wikimedia.org/r/714192