Page MenuHomePhabricator

Use InnoDB engine for searchindex table in MySQL
Open, Stalled, Needs TriagePublic

Description

Newer versions of Mysql (5.6+) support full-text indexes on innodb tables. We should move the searchindex table that is used for core search support to innodb because innodb sucks less and to be more consistent to user's wishes.

The largest issue here, is doing it in a way that makes tables.sql backaward compat with old mysql.

Event Timeline

Paladox raised the priority of this task from to Needs Triage.
Paladox updated the task description. (Show Details)
Paladox added subscribers: Paladox, Krinkle, Legoktm and 3 others.

Please ALWAYS follow https://www.mediawiki.org/wiki/How_to_report_a_bug and describe which specific problem you would like to solve, plus what extension this refers to and why.

Hi this is not a bug this is a suggestion.

It does not matter how you call it as long as you do not explain why you make that suggestion and why that suggestion could be useful. Please read my previous comment again and follow it, in general.

Bawolff renamed this task from Support searchindex in innodb in mysql to Use innodb engine for searchindex table in mysql.Sep 17 2015, 1:36 PM
Bawolff updated the task description. (Show Details)
Bawolff edited projects, added MediaWiki-Search; removed MediaWiki-extensions-General.
Bawolff set Security to None.

Please ALWAYS follow https://www.mediawiki.org/wiki/How_to_report_a_bug and describe which specific problem you would like to solve, plus what extension this refers to and why.

Paladox didn't exactly explain it the best, but... He's referring to mediawiki core support for search when there are no search extensions installed. Using InnoDB here may potentially result in better search results, may be less likely to have data corruption during server crashes, and may have better performance under contention (I have never tested any of these claims, nor know enough about it to say for certain if that's true. But it probably is). Additionally, we normally use innodb everywhere else (based on a config variable), so its nice to be consistent.

Discovery-ARCHIVED is focussed on Wikimedia users, so this task is out of scope for us.

@Bawolff I'm talking about currently searchindex or the search table in MediaWiki only uses Utf-8 and doesn't support innodb. Where as if we add a check to check weather a user uses a compatible MySQL then we could add sql for using innodb and any type of prefix for example binary or Utf-8. This is for search.

Change 290061 had a related patch set uploaded (by Paladox):
InnoDB now supports fulltext which we need for searchindex so lets use it

https://gerrit.wikimedia.org/r/290061

With https://gerrit.wikimedia.org/r/#/c/290061/ instead of updating tables.sql to use innodb for searchindex I added the sql patches instead so based on the MySQL version you use it will detect and use the correct one.

Change 291346 had a related patch set uploaded (by Paladox):
InnoDB now supports fulltext which we need for searchindex so lets use it

https://gerrit.wikimedia.org/r/291346

Change 291346 abandoned by Paladox:
InnoDB now supports fulltext which we need for searchindex so lets use it

https://gerrit.wikimedia.org/r/291346

Change 290061 abandoned by Paladox:
InnoDB now supports fulltext which we need for searchindex so lets use it

https://gerrit.wikimedia.org/r/290061

Change 291594 had a related patch set uploaded (by Paladox):
InnoDB now supports fulltext which we need for searchindex so lets use it

https://gerrit.wikimedia.org/r/291594

I recently needed to deploy Wikimedia on MySQL instances (5.6+) having only InnoDB Engine (and no MyISAM). The mw-config naturally failed due to searchindex table creation that needs MyISAM, and was fixed with a quick patch. Just wanted to share it as it was made trying to be "backward compatible" (for mediawiki new installation with MySQL usecase):

Replace lines 1274 to 1291 in tables.sql with following (searchindex creation script, https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql$1282 ):

--- The table is created with the db default engine / charset
--- For old MySQL version 5 (5.5 or older), the table creation is forced to MyISAM engine and utf8 charset.
SET @base = 'CREATE TABLE /*_*/searchindex ( si_page int unsigned NOT NULL ,  si_title varchar(255) NOT NULL default '''' , si_text mediumtext NOT NULL )';
SET @legacy_compat = ' ENGINE=MyISAM DEFAULT CHARSET=utf8;';
SET @script = IF( substring(@@GLOBAL.version,1,1) = 5 and substring(@@GLOBAL.version,3,1) < 6  , concat(@base,@legacy_compat), @base );
PREPARE statement FROM @script;
EXECUTE statement;
mmodell claimed this task.
mmodell subscribed.

This is ancient history now.

This is ancient history now.

Its not. Issue is still present in MediaWiki.

I have changed the searchindex table on my MediaWiki-based site to InnoDB. I have not done any formal tests, but the problems I noticed with searches against the MyISAM-based table were solved. I have noted two things that may be useful for others to know if they make the same change: rebuildtextindex.php deletes two search indexes for the table and then tries to create them again in one go. With some versions of MySQL (certainly 5.7.22) this crashes the script, for only one InnoDB fulltext index can be made at a time. Another problem is that searches that include a hyphen followed by a space or a hyphen ending the search string cause a "Wikimedia\Rdbms\DBQueryError". (On MediaWiki 1.30).

I'm moving a site to Percona and this keeps biting me. Thanks to @G9981, I'm going to test his suggested code and see if it works. In the meantime, I'm creating a hook for a custom extension to change the engine type.

I'm proposing changing this to innodb and bumping MediaWiki's minimum version supported of MySQL to 5.6.10 as a result.

Given that 5.6.10 has been available for use for 6 years and that 5.5 is not supported on Redhat systems (it was supported on SCL until Oct 2016) and that Debian only supports it on OldStable now and hasn't updated since April 2018, I think we're safe it telling users of MySQL 5.5 that newer versions of MediaWiki won't support them.

For those interested in query differences between MyISAM and InnoDB fulltext queries, see this post on Percona's site.

Change 519324 had a related patch set uploaded (by markahershberger; owner: markahershberger):
[mediawiki/core@master] Use InnoDB engine for the searchindex table in MySQL

https://gerrit.wikimedia.org/r/519324

Krinkle renamed this task from Use innodb engine for searchindex table in mysql to Use InnoDB engine for searchindex table in MySQL.Jul 18 2019, 10:28 PM
Krinkle edited projects, added MediaWiki-Installer; removed MediaWiki-libs-Rdbms.

Change 519324 abandoned by markahershberger:
Use InnoDB engine for the searchindex table in MySQL

https://gerrit.wikimedia.org/r/519324

Change 291594 abandoned by Paladox:

[mediawiki/core@master] InnoDB now supports fulltext, which we need for searchindex so lets use it

Reason:

Started https://gerrit.wikimedia.org/r/c/mediawiki/core/+/935486 instead

https://gerrit.wikimedia.org/r/291594

Change 935486 had a related patch set uploaded (by Paladox; author: Paladox):

[mediawiki/core@master] Use InnoDB for searchindex

https://gerrit.wikimedia.org/r/935486

@Ladsgroup I updated some of the searchindex patches a bit, but i got questions :)

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/935486

  1. This migration approach drops the table and recreates it. There's quite a few changes in there wrt to types, engines and indexes so i guess there is something to be said for that. ata the same time..
  2. I was considering adding COLLATE unicode_ai_ci to the si_title of searchindex, as that might possibly allow us to get rid of the TitleKey extension. But i don't know if this is possible with abstract schema ? Do i need to do a patch to fix that after the initial install in that case ?
  3. Similar question for the vector type of postgres.. i can only do this in a post processing step of the postgres installer/updater using an db specific patch right ?

@Ladsgroup I updated some of the searchindex patches a bit, but i got questions :)

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/935486

  1. This migration approach drops the table and recreates it. There's quite a few changes in there wrt to types, engines and indexes so i guess there is something to be said for that. ata the same time..

That is actually quite messy. Once you use the abstract schema change, it takes care of it automatically, you don't need to do anything. See http://mediawiki.org/wiki/Schema%20changes on how to use that.

If you still insist on redoing the whole thing, that you need to create table with another name so updater would work. (or create a dedicated method in the updater to do this)

  1. I was considering adding COLLATE unicode_ai_ci to the si_title of searchindex, as that might possibly allow us to get rid of the TitleKey extension. But i don't know if this is possible with abstract schema ? Do i need to do a patch to fix that after the initial install in that case ?

Doctrine DBAL is relatively extendible. We introduced mwtimestamp data type, you can probably do that too.

  1. Similar question for the vector type of postgres.. i can only do this in a post processing step of the postgres installer/updater using an db specific patch right ?

You could probably change the postgres integration we have. It's not trivial but not hard either. Has been done before. Check MWPostgreSqlPlatform

That is actually quite messy. Once you use the abstract schema change, it takes care of it automatically, you don't need to do anything. See http://mediawiki.org/wiki/Schema%20changes on how to use that.

If you still insist on redoing the whole thing, that you need to create table with another name so updater would work. (or create a dedicated method in the updater to do this)

Yeah I was thinking that, but I need to rebuild the index. Will it also do that for me automatically ?

  1. I was considering adding COLLATE unicode_ai_ci to the si_title of searchindex, as that might possibly allow us to get rid of the TitleKey extension. But i don't know if this is possible with abstract schema ? Do i need to do a patch to fix that after the initial install in that case ?

Doctrine DBAL is relatively extendible. We introduced mwtimestamp data type, you can probably do that too.

  1. Similar question for the vector type of postgres.. i can only do this in a post processing step of the postgres installer/updater using an db specific patch right ?

You could probably change the postgres integration we have. It's not trivial but not hard either. Has been done before. Check MWPostgreSqlPlatform

The idea would then be to create a new MWVectorizedText column type, and then define how that column is implemented in postgres with tsvector and in mysql as TEXT with a collation ? I would probably also need to implement a new query builder method ->search() or something that would then do a LIKE %% search in mysql and and tsquery() call in postgres right ?

Hmm, ok not sure if I want to get in this deep :)
Lets first finish the mysql part, and we'll keep the postgres version of this as unused zombie before trying to merge that in then, because that seems WAY more complicated.

The sqlite part I still need to figure out a bit more, as I don't think the current abstractSchema actually describes it correctly right now, and might actively break it if I run an abstract schema update for it.

  1. Similar question for the vector type of postgres.. i can only do this in a post processing step of the postgres installer/updater using an db specific patch right ?

You could probably change the postgres integration we have. It's not trivial but not hard either. Has been done before. Check MWPostgreSqlPlatform

The idea would then be to create a new MWVectorizedText column type, and then define how that column is implemented in postgres with tsvector and in mysql as TEXT with a collation ?

That'd be correct.

I would probably also need to implement a new query builder method ->search() or something that would then do a LIKE %% search in mysql and and tsquery() call in postgres right ?

You could probably introduce a subclass of SQB called SearchQueryBuilder and do that magic there. That would avoid introducing a new function on a class that's widely used just for one usecase.

Hmm, ok not sure if I want to get in this deep :)
Lets first finish the mysql part, and we'll keep the postgres version of this as unused zombie before trying to merge that in then, because that seems WAY more complicated.

Yes :D

Notes to self:

  • sqlite searchindex cannot be modified with abstractschema, as it has conditional feature fts3 (compile time option of the sqlite install)
  • those features are also not yet supported in abstractschema (virtual table, fts3)
  • sqlite only works, with the initial install, + the install phase patch it runs. Updates would break it.
  • probably need to add this logic to sqlite updater as well for the short term to get around this
  • sqlite requires a rebuild of the searchindex content after changing, as they are actually different table types.
  • alternatively, could split up searchindex of sqlite into a searchindex and searchindex_fts3 table ? Then have the rebuildtextindex script keep both in order. But then all queries need the tablename to vary on the sqlite condition.

@Ladsgroup I updated some of the searchindex patches a bit, but i got questions :)

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/935486

  1. This migration approach drops the table and recreates it. There's quite a few changes in there wrt to types, engines and indexes so i guess there is something to be said for that. ata the same time..

That is actually quite messy. Once you use the abstract schema change, it takes care of it automatically, you don't need to do anything. See http://mediawiki.org/wiki/Schema%20changes on how to use that.

@Ladsgroup K, I gave it another shot, but I had to introduce a MySqlUpdates::changeTableOption(), as apparently the DBAL comparator doesn't detect differences in table options!

Of note, I think the current maintenance/abstractSchemaChanges/patch-revision-rename-index.json with "table_options": [ "/*$wgDBTableOptions*/", "MAX_ROWS=10000000", "AVG_ROW_LENGTH=1024" ] doesn't actually run like that for users !

I'm going to follow this up with a searchindex maintenance script that allows you to set the collation, and add a config option to configure the default collation to be used for searchindex when running the installer.

TheDJ changed the task status from Open to Stalled.Jul 15 2024, 7:18 PM

This is currently blocked on all our testcases relying on tables to be cloneable, but innoDB cannot create a TEMPORARY table of a table that has a FULLTEXT index.

Discussed this with @Ladsgroup
We concluded that this is simply not really worth the effort in the this patch round. As Wikimedia doesn't rely on it, rearchitecting the testcases of Wikimedia to not so strongly depend on searchindex will needlessly complicate things. Instead, I'll provide maintenance script to convert to innodb for those who desire this.