Page MenuHomePhabricator

java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title'
Closed, DeclinedPublicFeature


Author: crajapa1007

I am trying to convert the japanese wikipedia's xml dump to sql file with mwdumper and "jawiki-latest-pages-meta-history.xml",
But I get a Java exception, and it doesn't work...

the message is that:

Exception in thread "main" java.sql.SQLException: Incorrect

string value: '\xF0\xA1\x9A\xB4' for column 'rev_user_text' at row 5014

the status of mysql on my PC is like a following:

mysql Ver 14.14 Distrib 5.1.50, for Win32 (ia32)

Connection id: 4
Current database: wikidb
Current user: root@localhost
SSL: Not in use
Using delimiter: ;
Server version: 5.1.50-community MySQL Community Server (GPL)
Protocol version: 10
Connection: localhost via TCP/IP
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
TCP port: 3306
Uptime: 52 min 20 sec

Threads: 1 Questions: 1594 Slow queries: 1 Opens: 379 Flush tables: 1 Open

tables: 3 Queries per second avg: 0.507

I don't understand very much about MySQL or Java, so it can be a simple problem.
But I could not solve this by myself.
So, please tell me what happens and how to solve.

Version: unspecified
Severity: enhancement
OS: Windows XP



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:04 PM
bzimport set Reference to bz24909.

piotr.jagielski wrote:

I have the same issue when trying to import English Wikipedia dump enwiki-20140402-pages-articles.xml. The exception is java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 192. I'm using the mwdumper GUI.

piotr.jagielski wrote:

I was able to work around it by creating the database with the following command: "create database wiki default character set binary;"

brooke set Security to None.
Aklapper renamed this task from mwdumper .IOException: java.sql.SQLException: Incorrect string value: to java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title'.Apr 23 2016, 9:08 AM
Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM
hashar subscribed.

mwdumper is no more able to process dump generated since MediaWiki 1.31 (released in June 2018). The tool started in 2005 and is no more maintained, it is thus being archived, see T351228 for reference.