iconv-test.c
Our test IPTCTest::testIPTCParseForcedUTFButInvalid verifies that when feeding image metadata marked as UTF-8 but with non-UTF-8 bytes, the bad bytes will be dropped and the sane UTF-8 kept.
This was the behavior of iconv() in php < 5.4 as can be tested with
var_dump( iconv("UTF-8", "UTF-8//IGNORE", "\xC3\xC3\xC3\xB8") );
The behavior of iconv(3) (with IGNORE) is to provide the good bytes *and* report the error. That can be tested with the attached program.
The fact that when not using IGNORE, the were returned was reported as a bug in https://bugs.php.net/52211 and fixed in e3fdf3 by always returning an empty string.
So our parsing of IPTC data is now different (wrong?) on PHP 5.4
We can:
- Set the empty string as the correct output (remove/change the test)
- Verify UTF-8 correctness ourselves (using UtfNormal::cleanUp() seems the appropiate one, we could then remove utf-8 replacement char if a slient skip is really desired).
- Request php iconv() behavior to change back / add a new flag.
Version: 1.20.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=73178
https://bugzilla.wikimedia.org/show_bug.cgi?id=67908
https://sourceware.org/bugzilla/show_bug.cgi?id=13541
https://bugs.php.net/bug.php?id=48147
Attached: