Page MenuHomePhabricator

Mediawiki 1.27.1 could not do Full text searching and failed to give correct searching result
Open, MediumPublic

Description

Environments: Mediawiki 1.27.1 + php-7.0.15-nts-Win32-VC14-x86 + nginx-1.10.3 + MS SQL SERVER 10.50.2550.0
Reproduce:

  1. Installation successful (needs to change the lengths of oi_name and oi_archive_name in oldimage table from 255 to 225, I believe this is another issue probably)
  2. Create several pages with a few test words as content.
  3. Try full-text searching.
  4. Could not search out anything related.
  5. Table searchindex contains all messy code.

Event Timeline

  1. Installation successful (needs to change the lengths of oi_name and oi_archive_name in oldimage table from 255 to 225, I believe this is another issue probably)

Indeed: T145635: Error installing with SQL Server 2014 Express: Index 'oi_name_archive_name' has more than 900 bytes long

  1. Table searchindex contains all messy code.

What does that mean?

Hi Aklapper,

It means that when you create pages, the words are all saved as mess codes to searchindex table.

I am not sure if that is the reason of failing to succeed full text searching afterwards.

All the best,
Richard

What are "mess codes"? Please provide some example.

Sorry , I mean 'unrecognizable characters'.
Like 믯₿獡晤獡晤橫獡摨慦獫㭪晤㭬歡橳晤㭬橫獡晤愠摳㭦歬慪㭳汤武慪摳⁦獡晤氻橫獡汤武栻慪摳⁦‧獡晤氻橫獡晤氻橫獡晤✠獡搻歬橦獡㭬摫橦愠摳㭦歬慪汳搻晫慪⁳⁤
Actually, what I inserted is English.

To what is the collation set?
http://stackoverflow.com/questions/13377812/getting-data-with-utf-8-charset-from-mssql-server-using-php-freetds-extension and http://stackoverflow.com/questions/1322421/php-sql-server-how-to-set-charset-for-connection might be relevant.

Currently this sounds like a configuration problem for the MediaWiki support desk and not like a bug in the software code. :)

[Edit:] Heh, looks like you (rightfully) started on the support desk at https://www.mediawiki.org/wiki/Topic:Tkzuxf7kfi2jz7hc - sorry then. :)

Hi Aklapper,

Tried the methods above. It does not work for me. I am not sure what exactly the problem is.
Actually, really thought about the collations, but there is one thing I feel confused that only the table 'searchindex' has this charset problem, the other tables are quite ok, like in both 'page' and 'text' tables, the chars display correct.

As this refers to the 'searchindex' table, it is almost certainly doing full text search with ms sql server.

debt triaged this task as Medium priority.Mar 2 2017, 11:17 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.
debt added a subscriber: debt.

We support MySQL right now, but probably can't support MMSQL - since we don't use MS Servers, debuggers, etc. We can check on this issue with MySQL but we'd need volunteer help for MMSQL.

Heads-up: As per RFC discussion in August 2019, the previously experimental support for using Oracle or MSSQL as database backends in MediaWiki core has been removed in MediaWiki 1.34, so this task might end up as declined in the future.