Page MenuHomePhabricator

Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1
Closed, ResolvedPublic

Description

The Beta-Cluster-Infrastructure MediaWiki databases are updated on an hourly basis by running MediaWiki maintenance/update.php. The job is beta-update-databases-eqiad.

The last known build occured on July 25th at 19:20 UTC. The next one is failing, it started at 20:20 UTC and failed at 20:31 UTC.

The segfault triggers when running the Wikifunctions database update:

+ /usr/local/bin/mwscript update.php --wiki=wikifunctionswiki --quick --skip-config-validation
20:31:09 ...wikilambda_zlanguages table already exists.
20:31:09 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
20:31:09 ...have wlzl_return_type field in wikilambda_zobject_labels table.
20:31:09 /usr/local/bin/mwscript: line 27:  1822 Segmentation fault      sudo -u "$MEDIAWIKI_WEB_USER" $PHP "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"

For the previous (good build) update.php gives:

19:30:18 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
19:30:18 ...have wlzl_return_type field in wikilambda_zobject_labels table.
19:30:18 	❌ Unable to make a page for Z7138: The provided content's label clashes with Object 'Z10138' for the label in 'Z1002'.
19:30:18 	❌ Unable to make a page for Z7139: The provided content's label clashes with Object 'Z10139' for the label in 'Z1002'.
19:30:18 	❌ Unable to make a page for Z7140: The provided content's label clashes with Object 'Z10140' for the label in 'Z1002'.
19:30:18 ...site_stats is populated...done.

Thus it looks like the attempt to add wlzl_return_type field in wikilambda_zobject_labels table triggers the issue.

Four changes got submitted to WikiLambda between 19:20 UTC and 20:20 UTC:

The function-schemata update looks suspicious :)

From the instance deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud:

[Mon Jul 24 23:33:55 2023] php[28392]: segfault at 7ffe374f5db8 ip 00007f8dc59fc807 sp 00007ffe374f5da0 error 6 in libpcre2-8.so.0.7.1[7f8dc59b9000+5d000]
[Mon Jul 24 23:33:55 2023] Code: ff ff 31 ed e9 74 fb ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 55 48 89 d5 53 44 89 c3 48 81 ec 98 52 00 00 <48> 89 7c 24 18 4c 8b a4 24 d0 52 00 00 48 89 74 24 10 48 89 4c 24
[Mon Jul 24 23:33:55 2023] Core dump to |/usr/lib/systemd/systemd-coredump 28392 33 33 11 1690242166 0 php pipe failed
[Mon Jul 24 23:35:10 2023] php[28479]: segfault at 7ffebb86b898 ip 00007f154a088807 sp 00007ffebb86b880 error 6 in libpcre2-8.so.0.7.1[7f154a045000+5d000]
[Mon Jul 24 23:35:10 2023] Code: ff ff 31 ed e9 74 fb ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 55 48 89 d5 53 44 89 c3 48 81 ec 98 52 00 00 <48> 89 7c 24 18 4c 8b a4 24 d0 52 00 00 48 89 74 24 10 48 89 4c 24
[Mon Jul 24 23:35:10 2023] Core dump to |/usr/lib/systemd/systemd-coredump 28479 33 33 11 1690242241 0 php pipe failed

At 23:21:53 another coredump shows an error in the coredump config:

[Tue Jul 25 23:21:53 2023] systemd-coredump[21045]: /etc/systemd/coredump.conf:6: Failed to parse size value, ignoring: 20%

And of course /var/lib/systemd/coredump is empty :]

Details

TitleReferenceAuthorSource BranchDest Branch
Draft: definitions: Drop Z7138, Z7139, Z7140repos/abstract-wiki/wikifunctions/function-schemata!29jforresterdrop-experimentalsmain
Customize query in GitLab

Event Timeline

I have changed /etc/systemd/coredump.conf:

- KeepFree=20%
+ KeepFree=4G

But that comes from Puppet. I have thus disabled the Puppet agent for now and have set the value to 4G. The core file should then be stored under /var/lib/systemd/coredump.

Mentioned in SAL (#wikimedia-releng) [2023-07-26T14:02:48Z] <hashar> deployment-prep: on deployment-deploy03, installed php / libpcre debugging symbols for T342769: sudo apt-get install php7.4-common-dbgsym php7.4-cli-dbgsym libpcre2-dbg

So Instead of messing up with systemd / coredump / ulimit I went to invoke the php command under GDB based on a previous issue T296539#7531235

sudo  -s -u www-data gdb --args /usr/bin/php /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki=wikifunctionswiki --quick --skip-config-validation
gdb> run
<output>
<output freeze for several minutes while update.php is doing something>

Thread 1 "php" received signal SIGSEGV, Segmentation fault.
0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, 
    subject=subject@entry=0x7fffcb410a98 "Z1002", length=length@entry=5, 
    start_offset=start_offset@entry=0, options=0, 
    match_data=match_data@entry=0x555555b023e0, mcontext=0x555555ad5870)
    at src/pcre2_match.c:6001
6001	src/pcre2_match.c: No such file or directory.

I could not find src/pcre2_match.c though I have then installed libpcre2-dev.

gdb> bt
#0  0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, 
    subject=subject@entry=0x7fffcb410a98 "Z1002", length=length@entry=5, 
    start_offset=start_offset@entry=0, options=0, 
    match_data=match_data@entry=0x555555b023e0, mcontext=0x555555ad5870)
    at src/pcre2_match.c:6001
#1  0x00005555556a3b24 in php_pcre_match_impl (pce=0x7fffe83685a0, 
    subject_str=0x7fffcb410a80, return_value=0x7fffcb44b220, subpats=0x0, global=0, 
    use_flags=<optimized out>, flags=0, start_offset=0) at ./ext/pcre/php_pcre.c:1300
#2  0x00005555556a493b in php_do_pcre_match (execute_data=0x7fffcb44b710, 
    return_value=0x7fffcb44b220, global=0) at ./ext/pcre/php_pcre.c:1149
#3  0x00007ffff216a3cb in tideways_xhprof_execute_internal ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#4  0x000055555587ddee in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1732
#5  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#6  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#7  0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#8  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#9  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#10 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#11 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#12 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#13 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#14 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#15 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#16 0x000055555587c63c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1602
#17 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53535
#18 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#19 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#20 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#21 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#22 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#23 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#24 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#25 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
 at ./Zend/zend_vm_execute.Quit
CONTINUING

That is very very similar to T296539#7531235 , that seems to indicate an infinite loop somewhere.

[10:41:48]  <+hashar> [0x7fffcb44b710] preg_match("\7^Z[1-9]\d*$\7u", "Z1002") [internal function]
[10:41:48]  <+hashar> [0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219 
[10:41:48]  <+hashar> [0x7fffcb44a760] Opis\JsonSchema\Validator->validateProperties(reference, reference, array(0)[0x7fffcb44a7d0], array(7)[0x7fffcb44a7e0], object[0x7fffcb44a7f0], object[0x7fffcb44a800], object[0x7fffcb44a810], NULL) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:943 
[10:42:00]  <+hashar> I will paste the details and the first few lines on the task
[10:43:05]  <+hashar> I guess the regex does an infinite loop
[10:43:20]  <+hashar> or whatever really I don't know
[10:46:18]  <James_F> hashar: OK, so that's probably coming from extensions/WikiLambda/function-schemata/data/CANONICAL/Z9.yaml

After I have remembered PHP provides some gdb helpers ( .gdbinit in php-7.4.30 branch), I have added it to my home dir and loaded it with gdb> source /home/hashar/gdbinit.

There is then a magic zbacktrace command which gives the php details:

(gdb) zbacktrace
[0x7fffcb44b710] preg_match("\7^Z[1-9]\d*$\7u", "Z1002") [internal function]
[0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219 
[0x7fffcb44a760] Opis\JsonSchema\Validator->validateProperties(reference, reference, array(0)[0x7fffcb44a7d0], array(7)[0x7fffcb44a7e0], object[0x7fffcb44a7f0], object[0x7fffcb44a800], object[0x7fffcb44a810], NULL) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:943 
[0x7fffcb44a4c0] Opis\JsonSchema\Validator->validateKeywords(reference, reference, array(0)[0x7fffcb44a530], array(7)[0x7fffcb44a540], object[0x7fffcb44a550], object[0x7fffcb44a560], object[0x7fffcb44a570]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:519 
[0x7fffcb44a310] Opis\JsonSchema\Validator->validateSchema(reference, reference, array(0)[0x7fffcb44a380], array(7)[0x7fffcb44a390], object[0x7fffcb44a3a0], object[0x7fffcb44a3b0], object[0x7fffcb44a3c0]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:332 
[0x7fffcb449350] Opis\JsonSchema\Validator->validateConditionals(reference, reference, array(0)[0x7fffcb4493c0], array(7)[0x7fffcb4493d0], object[0x7fffcb4493e0], object[0x7fffcb4493f0], object[0x7fffcb449400]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:703 
[0x7fffcb4490b0] Opis\JsonSchema\Validator->validateKeywords(reference, reference, array(0)[0x7fffcb449120], array(7)[0x7fffcb449130], object[0x7fffcb449140], object[0x7fffcb449150], object[0x7fffcb449160]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:523 
[0x7fffcb448f00] Opis\JsonSchema\Validator->validateSchema(reference, reference, array(0)[0x7fffcb448f70], array(7)[0x7fffcb448f80], object[0x7fffcb448f90], object[0x7fffcb448fa0], object[0x7fffcb448fb0]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:332 
<loop>

For [0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219 the arguments are:

(gdb) printzv 0x7fffcb44ac10
[0x7fffcb44ac10] (refcount=2) array:     Hash(0)[0x5555559d7f00]: {
}
(gdb) printzv 0x7fffcb44ac20
[0x7fffcb44ac20] (refcount=21) array:     Packed(7)[0x7fffcb486118]: {
      [0] 0 => [0x7fffcb445748] (refcount=17) string: Z2K2
      [1] 1 => [0x7fffcb445768] (refcount=18) string: Z4K2
      [2] 2 => [0x7fffcb445788] long: 1
      [3] 3 => [0x7fffcb4457a8] (refcount=15) string: Z3K3
      [4] 4 => [0x7fffcb4457c8] (refcount=10) string: Z12K1
      [5] 5 => [0x7fffcb4457e8] long: 1
      [6] 6 => [0x7fffcb445808] (refcount=6) string: Z11K1
}
(gdb) printzv 0x7fffcb44ac30
[0x7fffcb44ac30] (refcount=22) object(Opis\JsonSchema\Schema) #485450 {
id => [0x7fffcb40f508] (refcount=3) string: /Z6#
draft => [0x7fffcb40f518] (refcount=1) string: 07
internal => [0x7fffcb40f528] (refcount=1) reference: [0x7fffcb6704e8] (refcount=1) array:     Hash(1)[0x7fffcb4110e0]: {
      [0] "/Z6#" => [0x7fffcb71d280] (refcount=1) object(stdClass) #480576
}
(gdb) printzv 0x7fffcb44ac40
[0x7fffcb44ac40] (refcount=5) object(stdClass) #483827
Properties     Hash(1)[0x7fffcb6aa2a0]: {
      [0] "pattern" => [0x7fffcb67e3c0] (refcount=1) string: ^Z[1-9]\d*$
}
(gdb) printzv 0x7fffcb44ac50
[0x7fffcb44ac50] (refcount=5) object(Opis\JsonSchema\ValidationResult) #486348 {
maxErrors => [0x7fffcb4393e8] long: 1
errors => [0x7fffcb4393f8] (refcount=2) array:     Hash(0)[0x5555559d7f00]: {
}

The calling of Opis\JsonSchema`Validator comes from:

[0x7fffcb7491a0] Opis\JsonSchema\Validator->schemaValidation(reference, object[0x7fffcb749200], 1) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:97 
[0x7fffcb749100] MediaWiki\Extension\WikiLambda\Validation\SchemaWrapper->validate(object[0x7fffcb749150]) /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/Validation/SchemaWrapper.php:43 
[0x7fffcb748f30] MediaWiki\Extension\WikiLambda\Validation\ZObjectStructureValidator->validate(object[0x7fffcb748f80]) /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/Validation/ZObjectStructureValidator.php:79 
[0x7fffcb748e20] MediaWiki\Extension\WikiLambda\ZObjectFactory->validatePersistentKeys(object[0x7fffcb748e70]) /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/ZObjectFactory.php:156 
[0x7fffcb748870] MediaWiki\Extension\WikiLambda\ZObjectFactory->createPersistentContent(object[0x7fffcb7488c0]) /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/ZObjectFactory.php:96 
[0x7fffcb748760] MediaWiki\Extension\WikiLambda\ZObjectContent->validateContent() /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/ZObjectContent.php:116 
[0x7fffcb7486b0] MediaWiki\Extension\WikiLambda\ZObjectContent->isValid() /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/ZObjectContent.php:128 
[0x7fffcb7485f0] MediaWiki\Extension\WikiLambda\ZObjectContent->getZType() /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/ZObjectContent.php:249 
[0x7fffcb748470] MediaWiki\Extension\WikiLambda\Registry\ZTypeRegistry->isZObjectKeyKnown("Z60") /srv/mediawiki-staging/php-master/extensions/WikiLambda/includes/Registry/ZTypeRegistry.php:394 
...

The object passed to MediaWiki\Extension\WikiLambda\Validation\SchemaWrapper->validate is:

(gdb) printzv 0x7fffcb749150
[0x7fffcb749150] (refcount=6) object(stdClass) #485972
Properties     Hash(6)[0x7fffcb412150]: {
      [0] "Z1K1" => [0x7fffcb7ec4c0] (refcount=1) reference: [0x7fffcb7e59c8] (refcount=2) string: Z2

      [1] "Z2K1" => [0x7fffcb7ec4e0] (refcount=1) reference: [0x7fffcb7e5668] (refcount=1) object(stdClass) #485990

      [2] "Z2K2" => [0x7fffcb7ec500] (refcount=46) reference: [0x7fffcb406868] (refcount=18) object(stdClass) #485987

      [3] "Z2K3" => [0x7fffcb7ec520] (refcount=1) object(stdClass) #485919
      [4] "Z2K4" => [0x7fffcb7ec540] (refcount=1) object(stdClass) #485932
      [5] "Z2K5" => [0x7fffcb7ec560] (refcount=1) object(stdClass) #485951
}

Hope that helps.

Hopefully the infinite loop is reproducible somehow?

Jdforrester-WMF claimed this task.

Possibly caused by errant user input that wasn't stopped.

Fixed by the magic of mwscript extensions/WikiLambda/maintenance/reloadBuiltinData.php --wiki=wikifunctionswiki --force.

Sorry!

And eventually it is fixed:

16:30:26 <wmf-insecte> Project beta-update-databases-eqiad build #69029: FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69029/