Page MenuHomePhabricator

generating a sitemap fails with Error: 1300 Invalid utf8 character string:
Closed, DuplicatePublic


When i run "php maintenance/generateSitemap.php" then the script fails with the following message:

0 ()
1 (Diskussion)
A database query error has occurred.
Query: SELECT  user_name,up_value  FROM `user` LEFT JOIN `user_properties` ON ((user_id = up_user) AND up_property = 'gender')  WHERE user_name = '𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁𨮁喃'
Function: GenderCache::doQuery/MediaWikiTitleCodec::getNamespaceName
Error: 1300 Invalid utf8 character string: 'F0A8AE' (localhost)

Script is failing in function "generateLimit" in the row 561 ($title->getCanonicalURL(),).

A few lines above (line 555) there is a title created with exact that utf8 sequence failing in the query.

// bug 17961: make a title with the longest possible URL in this namespace
$title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" );

Event Timeline

Christof.spies raised the priority of this task from to Needs Triage.
Christof.spies updated the task description. (Show Details)
Christof.spies added a subscriber: Christof.spies.


$title = Title::makeTitle( $namespace, str_repeat( "aa", 63 ) . "aa" );
Aklapper triaged this task as Lowest priority.Mar 4 2015, 10:16 AM

Wondering about exact steps to reproduce here. Plus which MediaWiki version is this about?

mediawiki version: 1.24.1
run "php maintenance/generateSitemap.php" in application root

run "php maintenance/generateSitemap.php" in application root

Is that a fresh installation or an upgrade? Was this running generateSitemap for the very first time on that instance? If not, has it worked before?

Which PHP version and database backend + version? What's the database backend's character encoding / charset set to?

upgraded from 1.23.latest to 1.24.1
generateSitemap was running fine with 1.23.x and has this issue since updated.
php --version
PHP 5.4.36-0+deb7u3 (cli) (built: Jan 9 2015 08:07:06)
MySQL (mariadb) encoding is utf8_general_ci for db and tables