Page MenuHomePhabricator

Fix and enforce table prefix usage in columns and indexes in core
Closed, ResolvedPublic

Description

One good thing about having abstract schema is that you can spot issues much easier. One of the database conventions is to have table prefix for all columns and indexes (e.g. logging table has prefix of log_, all columns and index names should start with log_).

Given that 80% of tables are migrated to abstract schema, I quickly wrote this python script to check:

import json
with open('/var/lib/mediawiki/maintenance/tables.json', 'r') as f:
    tables = json.loads(f.read())
for table in tables:
    prefixes = []
    for column in table.get('columns', []):
        if 'name' in column:
            prefixes.append(column['name'].split('_')[0])
    for index in table.get('indexes', []):
        if 'name' in index:
            prefixes.append(index['name'].split('_')[0])
    if len(set(prefixes)) != 1:
        print(table['name'], prefixes)

And the result is these tables:

  • site_identifiers ['si', 'si', 'si', 'site', 'site']
  • user_properties ['up', 'up', 'up', 'user']
  • sites ['site', 'site', 'site', 'site', 'site', 'site', 'site', 'site', 'site', 'site', 'site', 'sites', 'sites', 'sites', 'sites', 'sites', 'sites', 'sites', 'sites']
  • user_newtalk ['user', 'user', 'user', 'un', 'un']
  • revision_actor_temp ['revactor', 'revactor', 'revactor', 'revactor', 'revactor', 'actor', 'page']
  • change_tag ['ct', 'ct', 'ct', 'ct', 'ct', 'ct', 'change', 'change', 'change', 'change']
  • page ['page', 'page', 'page', ' page', 'page', 'page', 'page', 'page', 'page', 'page', 'page', 'page', 'page', 'name', 'page', 'page', 'page']
  • objectcache ['keyname', 'value', 'exptime', 'exptime']

We should fix these and then easily add a unit test to avoid new cases being introduced in future

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+4 -19
mediawiki/coremaster+169 -23
mediawiki/coremaster+285 -6
mediawiki/extensions/Translatemaster+2 -8
mediawiki/coremaster+240 -1
mediawiki/coremaster+117 -9
mediawiki/extensions/FlaggedRevsmaster+8 -2
mediawiki/extensions/AbuseFiltermaster+6 -3
mediawiki/coremaster+17 -4
mediawiki/coremaster+26 -5
mediawiki/extensions/FlaggedRevsmaster+11 -3
mediawiki/coremaster+684 -11
mediawiki/extensions/FlaggedRevsmaster+8 -2
mediawiki/coremaster+100 -9
mediawiki/coremaster+45 -0
mediawiki/extensions/ArticleFeedbackv5master+0 -15
mediawiki/coremaster+292 -16
mediawiki/coremaster+260 -32
mediawiki/coremaster+89 -4
Show related patches Customize query in gerrit

Event Timeline

Reedy renamed this task from Fix and enforce table prefix usage in columns and idnexes in core to Fix and enforce table prefix usage in columns and indexes in core.Dec 13 2020, 3:48 PM

Change 649414 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename user_properties index

https://gerrit.wikimedia.org/r/649414

Change 649487 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename revision_actor_temp indexes

https://gerrit.wikimedia.org/r/649487

Change 649488 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename all sites indexes

https://gerrit.wikimedia.org/r/649488

Change 649414 merged by jenkins-bot:
[mediawiki/core@master] Rename user_properties index

https://gerrit.wikimedia.org/r/649414

Change 649488 merged by jenkins-bot:
[mediawiki/core@master] Rename all sites indexes

https://gerrit.wikimedia.org/r/649488

Change 648817 had a related patch set uploaded (by Ammarpad; owner: Ladsgroup):
[mediawiki/core@master] Rename four logging indexes to have log_ prefix

https://gerrit.wikimedia.org/r/648817

Change 648817 merged by jenkins-bot:
[mediawiki/core@master] Rename four logging indexes to have log_ prefix

https://gerrit.wikimedia.org/r/648817

Change 651160 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename change_tag indexes to have ct_ prefix

https://gerrit.wikimedia.org/r/651160

I wonder whether for user_newtalk, we should rename the columns instead to have the un_ prefix. Prefixing the indexes with user_ will give possibility of collision with the user table indexes in rdbms where the indexes are global. user table of course has much greater claim to this prefix.

Yeah, it's complicated, user prefix should be for user table. OTOH, renaming columns is a lot of work (migration of data, places it's used in core, etc.) but not impossible. My suggestion would be to add it to list of exceptions and create a ticket for the long term work (and link it in the exception list). How does that sound?

Looks OK. That would be better.

Change 651176 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename site_identifiers indexes to have si_ prefix

https://gerrit.wikimedia.org/r/651176

Change 653086 had a related patch set uploaded (by Umherirrender; owner: Umherirrender):
[mediawiki/extensions/ArticleFeedbackv5@master] Change name of log_page_time index to prepare for 1.36

https://gerrit.wikimedia.org/r/653086

Change 653086 merged by jenkins-bot:
[mediawiki/extensions/ArticleFeedbackv5@master] Remove index hint for renamed page_time

https://gerrit.wikimedia.org/r/653086

Change 653432 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Add test to assert uniform table prefix usage in abstract schema

https://gerrit.wikimedia.org/r/653432

Change 653432 merged by jenkins-bot:
[mediawiki/core@master] Add test to assert uniform table prefix usage in abstract schema

https://gerrit.wikimedia.org/r/653432

Change 651176 merged by jenkins-bot:
[mediawiki/core@master] Rename site_identifiers indexes to have si_ prefix

https://gerrit.wikimedia.org/r/651176

Change 660450 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/extensions/FlaggedRevs@master] Check use of 'change_tag' index that is being renamed

https://gerrit.wikimedia.org/r/660450

Change 660450 merged by jenkins-bot:
[mediawiki/extensions/FlaggedRevs@master] Check use of 'change_tag' index that is being renamed

https://gerrit.wikimedia.org/r/660450

Change 667358 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Rename new_name_timestamp on recentchanges to rc_new_name_timestamp

https://gerrit.wikimedia.org/r/667358

Change 667358 merged by jenkins-bot:
[mediawiki/core@master] Rename new_name_timestamp on recentchanges to rc_new_name_timestamp

https://gerrit.wikimedia.org/r/667358

Change 673832 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/extensions/FlaggedRevs@master] Check page_timestamp index forcing to prepare for renaming

https://gerrit.wikimedia.org/r/673832

Change 673833 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/extensions/AbuseFilter@master] Check forcing of page_timestamp revision index

https://gerrit.wikimedia.org/r/673833

Change 673995 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Prepare for renaming page_timestamp revision index

https://gerrit.wikimedia.org/r/673995

Ammarpad updated the task description. (Show Details)

Change 674138 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Rename page_timestamp revision index

https://gerrit.wikimedia.org/r/674138

Change 674813 had a related patch set uploaded (by Ammarpad; author: Ammarpad):
[mediawiki/extensions/FlaggedRevs@master] Check name_title index forcing to prepare for renaming

https://gerrit.wikimedia.org/r/674813

Change 674817 had a related patch set uploaded (by Ammarpad; author: Ammarpad):
[mediawiki/core@master] Check forcing of 'name_title' index to prepare for renaming

https://gerrit.wikimedia.org/r/674817

Change 674818 had a related patch set uploaded (by Ammarpad; author: Ammarpad):
[mediawiki/core@master] Rename name_title index to have page_ prefix

https://gerrit.wikimedia.org/r/674818

Change 673832 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Check page_timestamp index forcing to prepare for renaming

https://gerrit.wikimedia.org/r/673832

Change 673995 merged by jenkins-bot:

[mediawiki/core@master] Prepare for renaming `page_timestamp` revision index

https://gerrit.wikimedia.org/r/673995

Change 674138 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/core@master] Rename `page_timestamp` revision index

https://gerrit.wikimedia.org/r/674138

Change 674817 merged by jenkins-bot:

[mediawiki/core@master] Check forcing of 'name_title' index to prepare for renaming

https://gerrit.wikimedia.org/r/674817

Change 673833 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Check forcing of page_timestamp revision index

https://gerrit.wikimedia.org/r/673833

Change 684292 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/extensions/Translate@master] Check name_title index forcing to prepare for renaming

https://gerrit.wikimedia.org/r/684292

Change 674813 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Check name_title index forcing to prepare for renaming

https://gerrit.wikimedia.org/r/674813

Change 649487 abandoned by Ammarpad:

[mediawiki/core@master] Rename revision_actor_temp indexes

Reason:

No longer necessary

https://gerrit.wikimedia.org/r/649487

Change 674138 merged by jenkins-bot:

[mediawiki/core@master] Rename `page_timestamp` revision index

https://gerrit.wikimedia.org/r/674138

Change 684292 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Translate: Remove forcing of name_title index

https://gerrit.wikimedia.org/r/684292

Change 674818 merged by jenkins-bot:

[mediawiki/core@master] Rename name_title index to have page_ prefix

https://gerrit.wikimedia.org/r/674818

Change 651160 merged by jenkins-bot:

[mediawiki/core@master] Rename change_tag indexes to have ct_ prefix

https://gerrit.wikimedia.org/r/651160

Change 703017 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@master] Remove index check on page_name_title

https://gerrit.wikimedia.org/r/703017

Change 703017 merged by jenkins-bot:

[mediawiki/core@master] Remove index check on page_name_title

https://gerrit.wikimedia.org/r/703017

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle subscribed.

I imagine this enforcement would likely happen at the Rdbms layer. That would be a welcome enhancement that we can own and help review.

I imagine this enforcement would likely happen at the Rdbms layer. That would be a welcome enhancement that we can own and help review.

This is actually done already. I don't know why it's still open. I'm afraid it's not in rdbms library. It's a structure test which prevents adding a new column or index without following the policy by simply breaking CI (instead of doing it on the fly in production). It has a list of accepted exceptions which used to be longer but now only limited to objectcache table (which is a mess I don't want to get into atm).

Next steps which would help a lot are finding a way to actually wire extensions that have abstract schema to this test (and the other test of "make sure sql and json files are in sync") which requires some work in extension registry and the schema of extension.json. The sql paths can be also used during update.php runs instead of depending on SchemaChangUpdate hook. It would also help the drift tracker to automatically detect tables to check (at least the obvious cases, not the ones on x1 or centralauth).

But I assume the paragraph above is much bigger than this ticket. I suggest closing this and maybe creating a ticket for fixing objectcache.

Ack. A structure test is good too. Also helps keep the Rdbms library generic I suppose. Enforcing this by adding a new restriction there seems possibly feature creep. I was worried that we might be having code in Rdbms to support (rather than unsupport) violations of this but afaik thats not the case.