Page MenuHomePhabricator

Investigate restricting the ability to create sql queries with bad patterns in mediawiki
Closed, ResolvedPublic

Description

From: T246232, forked out for a seperate ticket for Platform Engineering to see if there is anything in core that can be changed to make this less likely to happen in the future.

At T246159 we can observe a really badly formatted query:

SELECT /* Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin */ wbtl_id, wbtl_type_id, wbxl_language, wbx_text, wbit_item_id FROM `wbt_term_in_lang`, `wbt_text_in_lang`, `wbt_text`, `wbt_item_terms` WHERE (wbtl_text_in_lang_id=wbxl_id) AND (wbxl_text_id=wbx_id) AND wbit_item_id IN (...) AND wbxl_language IN ('hu', 'en') AND (`wbit_term_in_lang_id` = wbtl_id)

Compare this with the equivalent:

SELECT /* Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin */
       wbtl_id, wbtl_type_id, wbxl_language, wbx_text, wbit_item_id
  FROM wbt_term_in_lang
  JOIN wbt_text_in_lang
    ON wbtl_text_in_lang_id = wbxl_id
  JOIN wbt_text
    ON wbxl_text_id = wbx_id
  JOIN wbt_item_terms
    ON wbit_term_in_lang_id = wbtl_id
 WHERE wbit_item_id IN (...) AND wbxl_language IN ('hu', 'en')

(ignore the quotation and justification, plus I don't know by heart the right columns of each table). It is funny that a method called selectTermsViaJoin didn't contain any join! :-)

While this could be disregarded as a minor issue, I would like to direct your attention to this issue for 2 reasons:

  • A badly formatted query like this has been the direct cause of outages at wikimedia database infrastructure. By convention (outside of Wikimedia, but specially on Wikimedia) implicit JOINS are highly discouraged. I don't know how even the wikimedia abstraction layer allows them.

JOINS should be explicit and be used with its ON clause. In the past, the removal of an ON clause but not removal from the implicit list of tables to join caused overload and an outage. As a preventive measure, this was considered a bad practice and discouraged. This is also a standard among any serious professional SQL work. This is hard to read, to the point I could not work with the query. JOIN ... ON is way more legible. You know with a single view which tables are involved and differenciate filtering from joining comparisons, preventing mistakes when changing the query.

  • This issue would have been caught immediately by DBAs or other developers on review and asked to be changed immediately. This leads to the more important issue that review process may be broken because not even the most obvious formatting issue has been corrected on review. I am not looking to assign any blame here, I am asking to review the reviewing process for database layer changes so obvious mistakes like these are caught on time.

Event Timeline

eprodromou subscribed.

On review, this seems like a small project to us, but not urgent.

Addshore renamed this task from Investigate restricting the ability to create queries with bad patterns to Investigate restricting the ability to create sql queries with bad patterns in mediawiki.Apr 30 2020, 6:20 PM

SelectQueryBuilder doesn't allow this. So I guess we can consider this done? I know select() is still heavily used but at least new usages shouldn't use it.

Addshore claimed this task.