|Open||None||T90630 Recurring Payments Reporting|
|Resolved||mepps||T97372 Enable ability to look up contacts by phone number|
|Resolved||None||T77910 [epic] Upgrade Civi to 4.6 & integrate new reporting|
|Resolved||None||T116416 [Epic] Followup tasks after Civi 4.6 upgrade|
|Resolved||awight||T116445 Coinbase import fails with database error|
|Declined||None||T119913 Ask Coinbase to export CSVs in an encoding that is compatible with Excel|
Looking in the logs, the problem was an insert into civicrm_address with a Swedish address that contained ä characters in the street and city. However, I can import a similar / anonymized record locally with no error. 'Describe civicrm_address' gives identical results locally and on staging.
It may be the db charset/collation. On production and on my local machine:
SELECT @@character_set_database, @@collation_database; +--------------------------+----------------------+ | @@character_set_database | @@collation_database | +--------------------------+----------------------+ | latin1 | latin1_swedish_ci | +--------------------------+----------------------+
but on staging:
SELECT @@character_set_database, @@collation_database; +--------------------------+----------------------+ | @@character_set_database | @@collation_database | +--------------------------+----------------------+ | binary | binary | +--------------------------+----------------------+
Why production and my local machine have latin1_swedish_ci for collation is a mystery to me, but it seems to let us insert funky characters.
Another theory: we're seeing a secondary error caused while formatting an initial exception, or during watchdog logging. check_plain is used in both cases.
Aargh, in fact, the first line of the screenshot is a warning:
Warning: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain() (line 1567 of /X/drupal/includes/bootstrap.inc).
Another warning appears in the logs,
Cannot find state: [Stockholm] (country 1204)
Here's the error:
$backTrace = string(3521) "#0 /X/civicrm/CRM/Core/Error.php(897): CRM_Core_Error::backtrace("backTrace", TRUE) #1 [internal function](): CRM_Core_Error::exceptionHandler(Object(DB_Error)) #2 /X/civicrm/packages/PEAR.php(931): call_user_func((Array:2), Object(DB_Error)) #3 /X/civicrm/packages/DB.php(976): PEAR_Error->PEAR_Error("DB Error: unknown error", -1, 16, (Array:2), "INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...") #4 /X/civicrm/packages/PEAR.php(564): DB_Error->DB_Error(-1, 16, (Array:2), "INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...") #5 /X/civicrm/packages/DB/common.php(1927): PEAR->raiseError(NULL, -1, NULL, NULL, "INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...", "DB_Error", TRUE) #6 /X/civicrm/packages/DB/mysqli.php(929): DB_common->raiseError(-1, NULL, NULL, NULL, "1366 ** Incorrect string value: '\xCC_gen ...' for column 'street_address' at...") #7 /X/civicrm/packages/DB/mysqli.php(403): DB_mysqli->mysqliRaiseError() #8 /X/civicrm/packages/DB/common.php(1238): DB_mysqli->simpleQuery("INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...") #9 /X/civicrm/packages/DB/DataObject.php(2442): DB_common->query("INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...") #10 /X/civicrm/packages/DB/DataObject.php(1060): DB_DataObject->_query("INSERT INTO civicrm_address (contact_id , location_type_id , is_primary , is_...") #11 /X/civicrm/CRM/Core/DAO.php(450): DB_DataObject->insert() #12 /X/civicrm/CRM/Core/BAO/Address.php(165): CRM_Core_DAO->save() #13 /X/sites/all/modules/wmf_civicrm/wmf_civicrm.module(1221): CRM_Core_BAO_Address::add((Array:13), FALSE) #14 /X/sites/all/modules/wmf_civicrm/wmf_civicrm.module(1165): wmf_civicrm_message_address_update((Array:51), 11455211) #15 /X/sites/all/modules/wmf_civicrm/wmf_civicrm.module(1151): wmf_civicrm_message_location_update((Array:51), (Array:5)) #16 /X/sites/all/modules/wmf_civicrm/wmf_civicrm.module(221): wmf_civicrm_message_location_insert((Array:51), (Array:5)) #17 /X/sites/all/modules/offline2civicrm/ChecksFile.php(79): wmf_civicrm_contribution_message_import((Array:51)) #18 /X/sites/all/modules/offline2civicrm/offline2civicrm.module(168): ChecksFile->import() #19 /X/drupal/includes/form.inc(1513): offline2civicrm_import_checks_form_submit((Array:28), (Array:20))
I can reproduce the error using raw SQL: insert into civicrm_address set street_address = x'CC';
This does look like a charset encoding mismatch like @Ejegg was speculating. I checked staging select @@character_set_connection; using drush civicrm-sql-cli, and it was binary. My local box says utf8, which sounds like a more correct setting.
This is getting kinda scary. Drupal includes code to always set the connection charset to utf8, so how can we be getting binary? That would lead to exactly the data corruption scenario we're most concerned about during the upgrade.
Err--nvm about that, the Drush command is starting a mysql client and cmdline params are only loosely based on configuration and drupal/database/mysql/database.inc
Leanne - are you able to put a copy of the csv you were trying to import on the file server & give me the location? It seems to have had Swedish characters in the address from current investigation
Coinbase gives us a CSV file, right? Or are you opening an XLS and resaving it before the import?
The root issue seems to be a bad file encoding, that makes it impossible to read non-Latin letters. Do you see "Järfälla" as the street address for that Stockholm donor, or is it garbled for you?
The file from the 19th looks like it's valid UTF-8 too. The problematic character noted in the error message was encoded as C3A4, which is correct for an a with umlaut. There's no byte order mark at the beginning of either file, which might be our issue.
I see 0xcc5f for the umlaut, in the CSV file, which is also what shows up in the stack trace. Where do you see 0xc3a4 in an error message? I don't know much about charset encoding though.. I'm probably missing something huge.
The new file has 0xccc7 where they probably meant "ë", plus some amazing unescaped HTML injection test attacks. A type of improvement!
I'm using xxd and libreoffice.
@awight, I was using bless, but xxd gives the same result when I run it on Report_orders_10-19-2015.csv straight from the file server. (slightly redacted)
00001070: XXXX 20XX XX2c 2c4a c3a4 7266 c3a4 6c6c XX XX,,J..rf..ll
Similarly, I'm seeing the ë in the 10/26 file dumped as c3ab. Are your file dates 10/20 14:45 and 10/28 19:32 ?
Your files have different names... They must have re-exported the 10-19 file, and we're looking in different dirs?
I'm pulling from smb://filesrv1/fundraising/Tech/Major%20Gifts/Coinbase
Aha! I snagged 'em from Tech/CoinBase. Guess someone's resaving them into the MG folder with a UTF-unaware application. @LeanneS, it looks like the original downloads are OK to import, but some copies got corrupted. If you need to edit any of them before upload, make sure any special characters and accents show up correctly before and after saving.
One thing I'm still concerned about is that saving the file from Excel might leave it mangled again. We can help you check for this, just open a file using the UTF-8 hack above, then save a copy and upload that to the file server where we can check the new encoding.
Update: it turns out we don't have to open and resave the file in Excel at all. In the future, let it be known that Excel is just completely incapable of importing a UTF-8 CSV. Use LibreOffice instead if you need to edit one of these files.
Very possible--the risk factors that combine to make this a problem are:
- Batch file contains international characters.
- File comes from the processor in CSV format.
- Data needs to be edited by our staff.
@CaitVirtue and @awight the only other import I could foresee this potentially being an issue with is the foreign check import if there are international characters. Looking at the last few import files from Engage, there weren't any international characters though. From my knowledge, I can't remember seeing any checks being sent with international characters, so it might be a nonissue, but I would have to review more international checks to confirm.
Totally. For the record, the correct solution is to have Coinbase save in a CSV format that Excel can understand. I believe that would be UTF-16, we'll get into the details in Q3 though.