Page MenuHomePhabricator

Clean up changed/temp table tracking in Database/Query and document table name expectations
Open, In Progress, MediumPublic

Description

The table names in the Query class are mostly used for two things:
a) For tracking what session temp tables exists, which effects whether a query counts a permanent write or not. This is used for logging and safety checks. Accommodations are made for the TEMPORARY tables used to mock permanent tables during testing, so they are treated similarly by these checks.
b) For supplying names to ChangedTablesTracker. This is used for tracking what tables changing during integration tests, so that we can reset them. These tables are usually though not always TEMPORARY.

For the session temp table tracking, we use a field in Database called "sessionTempTables", keyed by table name. Such table names should to be fully qualified before any simple equality comparisons. Currently, registerTempTables() and hasPermanentTable() just use tableName(...,'raw'), which resolves table prefixes/$wgSharedTables. There are a few problems here:

  • This does not qualify the table with the db name. Thus, it does not account for the fact temp tables on different dbs can have the same name in mysql.
  • We use schemas in postgres, but temp tables always go in the pg_temp schema, which makes it hard to use fully qualified table names in "sessionTempTables". Determining when to use db.pg_temp.table vs db.schema.table would be messy. Technically, sqlite uses a similar TEMP schema for temp table too, but doesn't practically matter for us unless ATTACH is used (e.g. $wgSharedTables).
  • These methods compare "raw" tables names with those from Query::getTable(), yet the corresponding "new Query()" callers do not prefix/qualify the table name. This means that, normally, a magic "base" table name (no prefix/$wgSharedTables applied) will be compared to the tableName(...,'raw') keys in "sessionTempTables", which does not make sense.

It isn't clear how much ChangedTablesTracker::getTables( $domainId ) is supposed to be for "changed tables belonging to this domain" or "table changed while the current domain was this domain", though it seems to mostly do the later. $wgSharedTables muddles the waters too much for the first case. In the second case, it's still odd that a table from $domainId could be changed while the current domain was $otherDomain, which would then not show in ChangedTablesTracker::getTables( $domainId ). This would only happen with something like $dbw->insert( "foriegn_db.table", ... ), which shouldn't really happen, but isn't proscribed by a comments nor safety checks.

It isn't documented whether ChangedTablesTracker is supposed to work with "base" table names (no prefix/$wgSharedTables applied yet). That seems to be the expectation though. For example, MediaWikiIntegrationTestCase::resetDB() checking against the string "user". When MediaWikiIntegrationTestCase::truncateTables() runs on those entries, this works as long as all callers of the RDBMS query wrapper methods provided "base" table names. The duplicateTableStructure() methods are special case, using a dot prefix to treat the table name as full (bypass prefix/$wgSharedTables).

If we want ChangedTablesTracker to use "base" table names, then pretty much all of the $table arguments to RDBMs methods should be documented as requiring "base" names and made to refer to some common documentation of the concept. This would still run into the problem of queries doing cross-table joins, which involve using remote DB table names (e.g. "db2.table"). SQLPlatform::tableName() will assumed that such tables are already prefixed and $wgSharedTables was resolved. Getting "base" names from qualified names is not possible (similar to "is this string encoded?" questions). This case would really only matter for cross-DB JOINs, which are rarely a thing in MediaWiki. It would otherwise be silly to get a db1 handle to do "SELECT ... FROM d2.table" and even more so for writes. The write methods could actually be made to disallow table names with "db.table" in them, and these are the ones that matter with regard to temp/write table tracking. QueryBuilderFromRawSql should be removed, with every query() using Query. This would avoid having to guess "base" table names from table names that are possibly prefixed, qualified or transformed by $wgSharedTable.

Event Timeline

Change #976368 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: improve Query/QueryBuilderFromRawSql handling of table tracking

https://gerrit.wikimedia.org/r/976368

aaron triaged this task as Medium priority.
aaron moved this task from Incoming (Needs Triage) to In Progress on the MW-Interfaces-Team board.
FJoseph-WMF changed the task status from Open to In Progress.Apr 5 2024, 2:43 PM

Change #1017422 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: document where unprefixed/unqualified table are expected

https://gerrit.wikimedia.org/r/1017422

Change #1019137 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: track session temp tables by DB name in Database

https://gerrit.wikimedia.org/r/1019137

Change #1017422 merged by jenkins-bot:

[mediawiki/core@master] rdbms: document where unprefixed/unqualified table names are expected

https://gerrit.wikimedia.org/r/1017422

Change #976368 merged by jenkins-bot:

[mediawiki/core@master] rdbms: use qualifiedTableComponents() in more places

https://gerrit.wikimedia.org/r/976368