Page MenuHomePhabricator

IPTCTest::testIPTCParseForcedUTFButInvalid fails on trusty/PHP5.5
Closed, ResolvedPublic

Description

06:59:05 1) IPTCTest::testIPTCParseForcedUTFButInvalid
06:59:05 Failed asserting that two arrays are equal.
06:59:05 --- Expected
06:59:05 +++ Actual
06:59:05 @@ @@
06:59:05  Array (
06:59:05 -    0 => 'ø'
06:59:05 +    0 => ''
06:59:05  )

From https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php55/2/consoleFull

php55 tests can be run against the mwcore repo by commenting "check experimental".

Event Timeline

Legoktm created this task.Jan 24 2016, 7:04 AM
Legoktm raised the priority of this task from to Normal.
Legoktm updated the task description. (Show Details)
Legoktm added subscribers: gerritbot, Paladox, PleaseStand and 15 others.
Reedy set Security to None.
Reedy added a subscriber: Bawolff.
Restricted Application added projects: Multimedia, Commons. · View Herald TranscriptJan 24 2016, 12:19 PM
Restricted Application added a subscriber: Steinsplitter. · View Herald Transcript

Could the error be related which version of phpunit we use.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJan 24 2016, 12:21 PM

Note, test passes on PHP 5.6

reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/core/tests/phpunit$ php phpunit.php --with-phpunitdir=/var/www/wiki/mediawiki includes/media/IPTCTest.php 
Will attempt loading PHPUnit from `/var/www/wiki/mediawiki`
#!/usr/bin/env php
Using PHPUnit from /var/www/wiki/mediawiki/phpunit-old.phar
PHPUnit 4.8.21 by Sebastian Bergmann and contributors.

Runtime:	PHP 5.6.11-1ubuntu3.1
Configuration:	/var/www/wiki/mediawiki/core/tests/phpunit/suite.xml
Warning:	Deprecated configuration setting "strict" used

........

Time: 1.29 seconds, Memory: 30.00Mb

OK (8 tests, 8 assertions)
reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/core/tests/phpunit$

Could the error be related which version of phpunit we use.

Probably not, but running PHPUnit 3.7.37, when we could be running PHPUnit 4.8 is something we should look to address

Could the error be related which version of phpunit we use.

Nope, tested with 3.7.37 on PHP 5.6, and tests pass

reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/core/tests/phpunit$ php phpunit.php --with-phpunitdir=/var/www/wiki/mediawiki includes/media/IPTCTest.php 
Will attempt loading PHPUnit from `/var/www/wiki/mediawiki`
#!/usr/bin/env php
PHPUnit 3.7.37 by Sebastian Bergmann.

Configuration read from /var/www/wiki/mediawiki/core/tests/phpunit/suite.xml

........

Time: 893 ms, Memory: 24.50Mb

OK (8 tests, 8 assertions)
reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/core/tests/phpunit$

Oh. why does it fail on php5.5 and not on php5.6.

Could you test it on php 5.5.31 since the test is using php 5.5.9 please.

Reedy added a comment.EditedJan 24 2016, 12:52 PM

https://3v4l.org/5AVoL shows all versions find it (without any trimming)

array(2) {
  ["2#025"]=>
  array(1) {
    [0]=>
    string(4) "��ø"
  }
  ["1#090"]=>
  array(1) {
    [0]=>
    string(3) "%G"
  }
}

https://3v4l.org/Adseg

It seems to be a PHP bug in versions "5.4.12 - 5.5.25, 5.6.0 - 5.6.9, hhvm-3.6.1 - 3.8.1". Which explains why it works for me, but fails on 5.5.9

Output for 5.5.26 - 5.5.31, 5.6.10 - 7.0.2, hhvm-3.9.1 - 3.11.0
array(1) {
  [0]=>
  string(2) "ø"
}
Output for 5.4.12 - 5.5.25, 5.6.0 - 5.6.9, hhvm-3.6.1 - 3.8.1
Notice: iconv(): Detected an illegal character in input string in /in/Adseg on line 430
array(1) {
  [0]=>
  string(0) ""
}
Output for 5.3.22 - 5.3.29
Notice: iconv(): Detected an illegal character in input string in /in/Adseg on line 430
array(1) {
  [0]=>
  string(2) "ø"
}

(Ignore the iconv() warnings as I just commented out the suppresswarnings/unsuppresswarnings calls)

It would look like something is eating it (disappears when it goes through self::convIPTC())

Oh should we open a bug request, requesting that php5.5 be updated to 5.5.31.

Oh should we open a bug request, requesting that php5.5 be updated to 5.5.31.

Why? Trusty provides 5.5.9, which is the minimum supported version we were wanting to target. In theory, we could suggest 5.5.26, but there might be something we can do to MW code in self::convIPTC() to work around this

Would that bug also affect https://www.mediawiki.org/wiki/Special:Version since it uses hhvm 3.6.5

Reedy added a comment.EditedJan 24 2016, 1:00 PM

Based on https://3v4l.org/Otno6 when it dives into self::convIPTC() it's eating too many characters (var_dump before and after the call)

Output for 5.5.26 - 5.5.31, 5.6.10 - 7.0.2, hhvm-3.9.1 - 3.11.0
array(1) {
  [0]=>
  string(4) "��ø"
}
string(5) "UTF-8"
array(1) {
  [0]=>
  string(2) "ø"
}
array(1) {
  [0]=>
  string(2) "ø"
}
Output for 5.4.12 - 5.5.25, 5.6.0 - 5.6.9, hhvm-3.6.1 - 3.8.1
array(1) {
  [0]=>
  string(4) "��ø"
}
string(5) "UTF-8"

Notice: iconv(): Detected an illegal character in input string in /in/Otno6 on line 432
array(1) {
  [0]=>
  string(0) ""
}
array(1) {
  [0]=>
  string(0) ""
}
Output for 5.3.22 - 5.3.29
array(1) {
  [0]=>
  string(4) "��ø"
}
string(5) "UTF-8"

Notice: iconv(): Detected an illegal character in input string in /in/Otno6 on line 432
array(1) {

I guess this is the most telling

Notice: iconv(): Detected an illegal character in input string in /in/Otno6 on line 432

It's normally wrapped in suppress warnings

Reedy added a comment.Jan 24 2016, 1:18 PM
<?php

$iptcData = "Photoshop 3.0\08BIM\4\4\0\0\0\0\0\x11\x1c\x02\x19\x00\x04\xC3\xC3\xC3\xB8"
	. "\x1c\x01\x5A\x00\x03\x1B\x25\x47";
$res = iptcparse( $iptcData );
$keywords = $res['2#025'];
foreach ( $keywords as $val ) {
	var_dump( iconv( "UTF-8", "UTF-8//IGNORE", $val ) );
}

becomes our minimum recplicable case

Output for 5.5.26 - 5.5.31, 5.6.10 - 7.0.2, hhvm-3.9.1 - 3.11.0
string(2) "ø"
Output for 5.4.12 - 5.5.25, 5.6.0 - 5.6.9, hhvm-3.6.1 - 3.8.1
Notice: iconv(): Detected an illegal character in input string in /in/sLcu4 on line 8
bool(false)
Output for 5.3.22 - 5.3.29
Notice: iconv(): Detected an illegal character in input string in /in/sLcu4 on line 8
string(2) "ø"

https://3v4l.org/sLcu4

Reedy added a comment.EditedJan 24 2016, 1:19 PM

Which seems to be https://bugs.php.net/bug.php?id=48147 "iconv with //IGNORE cuts the string"

https://git.php.net/?p=php-src.git;a=commit;h=473ec539a1c3d242c8b171dd6a5a98fa17e05c13

From: Stanislav Malyshev <stas@php.net>
Date: Fri, 8 May 2015 07:03:54 +0000 (-0700)
Subject: Fix #48147 - implement manual handling of  //IGNORE for broken libc
X-Git-Tag: php-5.5.26RC1~27
X-Git-Url: https://72.52.91.13:4430/?p=php-src.git;a=commitdiff_plain;h=473ec539a1c3d242c8b171dd6a5a98fa17e05c13

Fix #48147 - implement manual handling of  //IGNORE for broken libc

Conflicts:
	ext/iconv/iconv.c
---

diff --git a/NEWS b/NEWS
index 2e0b8b1..9f843c7 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,9 @@ PHP                                                                        NEWS
   . Fixed bug #69566 (Conditional jump or move depends on uninitialised value
     in extension trait). (jbboehr at gmail dot com)
 
+- Iconv:
+  . Fixed bug #48147 (iconv with //IGNORE cuts the string). (Stas)
+
 - Opcache
   . Fixed bug #69549 (Memory leak with opcache.optimization_level=0xFFFFFFFF).
     (Laruence, Dmitry)
diff --git a/ext/iconv/config.m4 b/ext/iconv/config.m4
index 10d21cc..fce5993 100644
--- a/ext/iconv/config.m4
+++ b/ext/iconv/config.m4
@@ -35,7 +35,7 @@ if test "$PHP_ICONV" != "no"; then
       PHP_ICONV_H_PATH="$PHP_ICONV_PREFIX/include/giconv.h"
     else
       PHP_ICONV_H_PATH="$PHP_ICONV_PREFIX/include/iconv.h"
-    fi 
+	fi
 
     AC_MSG_CHECKING([if iconv is glibc's])
     AC_TRY_LINK([#include <gnu/libc-version.h>],[gnu_get_libc_version();],
@@ -53,8 +53,8 @@ if test "$PHP_ICONV" != "no"; then
       AC_TRY_RUN([
 #include <$PHP_ICONV_H_PATH>
 int main() {
-	printf("%d", _libiconv_version);
-	return 0;
+  printf("%d", _libiconv_version);
+  return 0;
 }
       ],[
         AC_MSG_RESULT(yes)
@@ -138,7 +138,7 @@ int main() {
   if (cd == (iconv_t)(-1)) {
     if (errno == EINVAL) {
       return 0;
-	} else {
+  } else {
       return 1;
     }
   }
@@ -159,6 +159,37 @@ int main() {
       AC_DEFINE([ICONV_SUPPORTS_ERRNO],0,[Whether iconv supports error no or not])
     ])
 
+    AC_MSG_CHECKING([if iconv supports //IGNORE])
+    AC_TRY_RUN([
+#include <$PHP_ICONV_H_PATH>
+#include <stdlib.h>
+
+int main() {
+  iconv_t cd = iconv_open( "UTF-8//IGNORE", "UTF-8" );
+  char *in_p = "\xC3\xC3\xC3\xB8";
+  size_t in_left = 4, out_left = 4096;
+  char *out = malloc(out_left);
+  char *out_p = out;
+  size_t result = iconv(cd, (char **) &in_p, &in_left, (char **) &out_p, &out_left);
+  if(result == (size_t)-1) {
+    return 1;
+  }
+  return 0;
+}
+   ],[
+      AC_MSG_RESULT(yes)
+      PHP_DEFINE([ICONV_BROKEN_IGNORE],0,[ext/iconv])
+      AC_DEFINE([ICONV_BROKEN_IGNORE],0,[Whether iconv supports IGNORE])
+    ],[
+      AC_MSG_RESULT(no)
+      PHP_DEFINE([ICONV_BROKEN_IGNORE],1,[ext/iconv])
+      AC_DEFINE([ICONV_BROKEN_IGNORE],1,[Whether iconv supports IGNORE])
+    ],[
+      AC_MSG_RESULT(no, cross-compiling)
+      PHP_DEFINE([ICONV_SUPPORTS_ERRNO],0,[ext/iconv])
+      AC_DEFINE([ICONV_SUPPORTS_ERRNO],0,[Whether iconv supports IGNORE])
+    ])
+
     AC_MSG_CHECKING([if your cpp allows macro usage in include lines])
     AC_TRY_COMPILE([
 #define FOO <$PHP_ICONV_H_PATH>
diff --git a/ext/iconv/iconv.c b/ext/iconv/iconv.c
index 12127d8..5921d47 100644
--- a/ext/iconv/iconv.c
+++ b/ext/iconv/iconv.c
@@ -464,6 +464,24 @@ static php_iconv_err_t _php_iconv_appendc(smart_str *d, const char c, iconv_t cd
 }
 /* }}} */
 
+/* {{{ */
+#if ICONV_BROKEN_IGNORE
+static int _php_check_ignore(const char *charset)
+{
+  size_t clen = strlen(charset);
+  if (clen >= 9 && strcmp("//IGNORE", charset+clen-8) == 0) {
+    return 1;
+  }
+  if (clen >= 19 && strcmp("//IGNORE//TRANSLIT", charset+clen-18) == 0) {
+    return 1;
+  }
+  return 0;
+}
+#else
+#define _php_check_ignore(x) (0)
+#endif
+/* }}} */
+
 /* {{{ php_iconv_string()
  */
 PHP_ICONV_API php_iconv_err_t php_iconv_string(const char *in_p, size_t in_len,
@@ -545,6 +563,7 @@ PHP_ICONV_API php_iconv_err_t php_iconv_string(const char *in_p, size_t in_len,
 	char *out_p, *out_buf, *tmp_buf;
 	size_t bsz, result = 0;
 	php_iconv_err_t retval = PHP_ICONV_ERR_SUCCESS;
+	int ignore_ilseq = _php_check_ignore(out_charset);
 
 	*out = NULL;
 	*out_len = 0;
@@ -569,6 +588,17 @@ PHP_ICONV_API php_iconv_err_t php_iconv_string(const char *in_p, size_t in_len,
 		result = iconv(cd, (char **) &in_p, &in_left, (char **) &out_p, &out_left);
 		out_size = bsz - out_left;
 		if (result == (size_t)(-1)) {
+			if (ignore_ilseq && errno == EILSEQ) {
+				if (in_left <= 1) {
+					result = 0;
+				} else {
+					errno = 0;
+					in_p++;
+					in_left--;
+					continue;
+				}
+			}
+
 			if (errno == E2BIG && in_left > 0) {
 				/* converted string is longer than out buffer */
 				bsz += in_len;
diff --git a/ext/iconv/tests/bug48147.phpt b/ext/iconv/tests/bug48147.phpt
new file mode 100644
index 0000000..342f920
--- /dev/null
+++ b/ext/iconv/tests/bug48147.phpt
@@ -0,0 +1,27 @@
+--TEST--
+Bug #48147 (iconv with //IGNORE cuts the string)
+--SKIPIF--
+<?php extension_loaded('iconv') or die('skip iconv extension is not available'); ?>
+--FILE--
+<?php
+$text = "aa\xC3\xC3\xC3\xB8aa";
+var_dump(iconv("UTF-8", "UTF-8", $text));
+var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", $text)));
+// only invalid
+var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "\xC3")));
+// start invalid
+var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "\xC3\xC3\xC3\xB8aa")));
+// finish invalid
+var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "aa\xC3\xC3\xC3")));
+?>
+--EXPECTF--
+Notice: iconv(): Detected an illegal character in input string in %s on line %d
+bool(false)
+string(10) "aa%C3%B8aa"
+
+Notice: iconv(): Detected an incomplete multibyte character in input string in %s on line %d
+string(0) ""
+string(8) "%C3%B8aa"
+
+Notice: iconv(): Detected an incomplete multibyte character in input string in %s on line %d
+string(0) ""
Reedy added a comment.Jan 24 2016, 1:24 PM

Would that bug also affect https://www.mediawiki.org/wiki/Special:Version since it uses hhvm 3.6.5

You would presume so, but the fact that https://gerrit.wikimedia.org/r/#/c/214571/ doesn't fail on hhvm tests would suggest otherwise

Reedy added a subscriber: bd808.

Would that bug also affect https://www.mediawiki.org/wiki/Special:Version since it uses hhvm 3.6.5

HHVM has an ini setting that works around this problem: hhvm.hack.lang.iconv_ignore_correct=true
The name of the setting makes it seem like it is for Hack mode only but in reality it applies in normal PHP mode as well. This setting has been changed for MediaWiki-Vagrant and WMF's production HHVM configs.

The fix @Smalyshev provided to PHP 5.5+ and the discovery of the HHVM hhvm.hack.lang.iconv_ignore_correct=true setting make this bug something that can be worked around in the PHP interpreter itself either by upgrading or via configuration.
I'm inclined to close this as resolved. Any objections?

Reedy added a comment.Jan 24 2016, 5:59 PM

@Bawolff Any suggestions what this might break? Would we lose anything particularly? Could/should we just disable this test on known broken versions of PHP?

@Bawolff Any suggestions what this might break? Would we lose anything particularly? Could/should we just disable this test on known broken versions of PHP?

Sounds fine to me. All that would break is that if someone has IPTC-iim metadata (The old type of IPTC, not the "current" version) in a file, that is marked as UTF-8, the output might be a little garbled. Seems reasonable outcome for files with incorrect data in them

Change 266542 had a related patch set uploaded (by Legoktm):
Skip IPTCTest::testIPTCParseForcedUTFButInvalid() on pre-PHP5.5.26

https://gerrit.wikimedia.org/r/266542

Change 266542 merged by jenkins-bot:
Skip IPTCTest::testIPTCParseForcedUTFButInvalid() on pre-PHP5.5.26

https://gerrit.wikimedia.org/r/266542

Reedy added a comment.Jan 26 2016, 9:00 PM

Change 266542 had a related patch set uploaded (by Legoktm):
Skip IPTCTest::testIPTCParseForcedUTFButInvalid() on pre-PHP5.5.26
https://gerrit.wikimedia.org/r/266542

PHP 5.6 < 5.6.10 also have this issue

HappyDog removed a subscriber: HappyDog.Jan 26 2016, 9:52 PM

PHP 5.6 < 5.6.10 also have this issue

Ugh, so, 5.3 is fine, 5.4 is all broken, 5.5 < 5.5.26 is broken, and 5.6 < 5.6.10 is broken?

Reedy added a comment.Jan 28 2016, 9:15 AM

https://secure.php.net/ChangeLog-5.php

Yeah, fix applied in 5.6.10 and 5.5.26

Not sure about 5.4, but based on T39665, I'm guessing there's some issue. When we merge the 5.5 patches, I guess, at that point, we don't care anymore?

Change 269829 had a related patch set uploaded (by Reedy):
Disable testIPTCParseForcedUTFButInvalid on PHP > 5.6.0 but < 5.6.10

https://gerrit.wikimedia.org/r/269829

Change 269829 merged by jenkins-bot:
Disable testIPTCParseForcedUTFButInvalid on PHP > 5.6.0 but < 5.6.10

https://gerrit.wikimedia.org/r/269829

Steinsplitter moved this task from Incoming to Backlog on the Commons board.Feb 13 2016, 5:35 PM
Glaisher removed a subscriber: Glaisher.Feb 14 2016, 3:52 PM

Change 275018 had a related patch set uploaded (by Paladox):
Skip IPTCTest::testIPTCParseForcedUTFButInvalid() on pre-PHP5.5.26

https://gerrit.wikimedia.org/r/275018

Change 275018 merged by Cdentinger:
Skip IPTCTest::testIPTCParseForcedUTFButInvalid() on pre-PHP5.5.26

https://gerrit.wikimedia.org/r/275018

Change 275035 had a related patch set uploaded (by Paladox):
Disable testIPTCParseForcedUTFButInvalid on PHP > 5.6.0 but < 5.6.10

https://gerrit.wikimedia.org/r/275035

Reedy closed this task as Resolved.Mar 4 2016, 11:23 PM
Reedy claimed this task.

Change 275035 merged by Ejegg:
Disable testIPTCParseForcedUTFButInvalid on PHP > 5.6.0 but < 5.6.10

https://gerrit.wikimedia.org/r/275035

Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptMay 23 2016, 5:58 PM
Danny_B moved this task from Unsorted to PHP 5.5 on the [DO NOT USE] NewPHP board.Jul 11 2016, 1:51 AM