Page MenuHomePhabricator

PrefixSearchTest::testSearch fails on PostgreSQL
Open, LowPublic

Description

From my newest PHPUnit run:

fix', 'Ex', array('Example', 'Example/Baz', 'Example Bar')))
Main namespace with title prefix
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
     0 => 'Example'
-    1 => 'Example/Baz'
-    2 => 'Example Bar'
+    1 => 'Example Bar'
+    2 => 'Example/Baz'
 )

phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/Constraint/IsEqual.php:170
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/Assert.php:2134
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/Assert.php:475
/usr/home/saper/public_html/pg/w/tests/phpunit/includes/PrefixSearchTest.php:138
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestCase.php:976
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestCase.php:831
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestResult.php:648
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestCase.php:776
/usr/home/saper/public_html/pg/w/tests/phpunit/MediaWikiTestCase.php:141
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestSuite.php:775
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestSuite.php:745
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestSuite.php:705
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestSuite.php:705
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/Framework/TestSuite.php:705
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/TextUI/TestRunner.php:349
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/TextUI/Command.php:176
phar:///usr/share/php/phpunit/phpunit.phar/PHPUnit/TextUI/Command.php:129
/usr/share/php/phpunit/phpunit.phar:531

Freshly installed empty wiki via the commandline installer


Version: 1.25-git
Severity: normal

Details

Reference
bz73289

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:50 AM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz73289.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 73288 has been marked as a duplicate of this bug. ***
Aklapper subscribed.

This isn't something that the discovery team supports.

Jdforrester-WMF subscribed.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

aaron subscribed.

I still have this probably locally and have for years. There is probably some bad assumption that some postgres sort/collation default will have a certain value.

It's interesting that "/" sorts before " " in mysql/sqlite and the test assertions.

In terms of the title prefix searching in PrefixSearch, it just uses the same page table query for all rdbms engines. Sqlite uses BLOB, Mysql uses VARBINARY for the title field, and Postgres uses TEXT. For storage of binary columns containing utf8 strings, I'm not sure why sqlite/mysql sorts "/" before " " . The mysql server and db collations are latin1_swedish_ci, but that should not matter for varbinary columns. Since these test strings come in as utf8 and are stored with binary encoding+collation as varbinary, I'd expect ' ' to use code point x20 and '/' to use x2F. Yet the later sorts first from the ORDER BY and the tests expect that (so it passes for mysql/sqlite). I feel like there is some hidden collating going on...

My postgres locale info, triggered via the package installer invoking initdb, is:

postgres=# \l
                                                        List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | ICU Locale | ICU Rules |    Access privileges    
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-------------------------
 my_wiki   | postgres | UTF8     | libc            | fr_FR.UTF-8 | fr_FR.UTF-8 |            |           | =Tc/postgres           +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres  +
           |          |          |                 |             |             |            |           | "www-data"=CTc/postgres
 postgres  | postgres | UTF8     | libc            | fr_FR.UTF-8 | fr_FR.UTF-8 |            |           | 
 template0 | postgres | UTF8     | libc            | fr_FR.UTF-8 | fr_FR.UTF-8 |            |           | =c/postgres            +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | fr_FR.UTF-8 | fr_FR.UTF-8 |            |           | =c/postgres            +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres

Forcing the db to use en_US.UTF-8 and use template0 did not change anything for postgres.

Note that in terms of content searching, sqlite uses CLOB (TEXT affinity, not BLOB) in searchindex, mysql specifically uses DEFAULT CHARSET = utf8mb4 in searchindex, while postgres does not searchindex but rather pagecontent directly, which uses TEXT.