Page MenuHomePhabricator

deployment-prep automatic update.php fails with Segmentation Fault
Closed, ResolvedPublic

Description

The beta-update-databases-eqiad Jenkins job runs update.php against all of Beta-Cluster-Infrastructure wikis once per hour. The job started failing on Nov 26 at 6:20 UTC with:

$ mwscript update.php --wiki=aawiki --quick
...
usr/local/bin/mwscript: line 26:  9366 Segmentation fault  

The build at 5:20 UTC worked.

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2021-11-26T14:15:55Z] <hashar> deployment-prep: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ database updating job is broken since 6:20 UTC due to a segmentation fault | T296539

When things break in this way, my usual suspect is unattended upgrade of Debian package which happens at 6:15 UTC via a cron job. On the instance deployment-deploy01 we have:

/var/log/apt/history.log
Start-Date: 2021-11-25  06:15:55
Commandline: /usr/bin/unattended-upgrade
Remove: libvips42:amd64 (8.4.5-1+deb9u2), liborc-0.4-0:amd64 (1:0.4.26-2), libaec0:amd64 (0.3.2-1), libhdf5-100:amd64 (1.10.0-patch1+docs-3+deb9u1), libgsf-1-114:amd64 (1.14.41-1), libsz2:amd64 (0.3.2-1), libtidy5:amd64 (1:5.2.0-2), libexif12:amd64 (0.6.21-2+deb9u5), libopenslide0:amd64 (3.4.1+dfsg-2), libilmbase12:amd64 (2.2.0-12), libgfortran3:amd64 (6.3.0-18+deb9u1), libgsf-1-common:amd64 (1.14.41-1), libmatio4:amd64 (1.5.9-1+b1), libopenexr22:amd64 (2.2.0-11+deb9u4), libpoppler-glib8:amd64 (0.48.0-2+deb9u4), libcfitsio5:amd64 (3.410-1)
End-Date: 2021-11-25  06:16:00

And:

/var/log/apt/term.log
Removing libvips42:amd64 (8.4.5-1+deb9u2) ...
Removing libmatio4:amd64 (1.5.9-1+b1) ...
Removing libhdf5-100:amd64 (1.10.0-patch1+docs-3+deb9u1) ...
Removing libsz2:amd64 (0.3.2-1) ...
Removing libaec0:amd64 (0.3.2-1) ...
Removing libcfitsio5:amd64 (3.410-1) ...
Removing libexif12:amd64 (0.6.21-2+deb9u5) ...
Removing libgfortran3:amd64 (6.3.0-18+deb9u1) ...
Removing libgsf-1-114:amd64 (1.14.41-1) ...
Removing libgsf-1-common (1.14.41-1) ...
Removing libopenexr22:amd64 (2.2.0-11+deb9u4) ...
Removing libilmbase12:amd64 (2.2.0-12) ...
Removing libopenslide0 (3.4.1+dfsg-2) ...
Removing liborc-0.4-0:amd64 (1:0.4.26-2) ...
Removing libpoppler-glib8:amd64 (0.48.0-2+deb9u4) ...
Removing libtidy5 (1:5.2.0-2) ...

That follows unattended upgrade removing tidy and libvips-tools on 11/24.

I guess one can reproduce the Segmentation fault by running the update script from deployment-deploy01.deployment-prep.eqiad1.wikimedia.cloud , as to how it can be debugged we would need a core file and gdb.

Can reproduce without the wmf-beta-update-databases wrapper:

taavi@deployment-deploy01:~$ mwscript update.php --wiki aawiki --quick
#!/usr/bin/env php
Warning: session_name(): Cannot change session name when headers already sent in /srv/mediawiki-staging/wmf-config/CommonSettings.php on line 587
MediaWiki 1.38.0-alpha Updater

Your composer.lock file is up to date with current dependencies!
Going to run database updates for aawiki
Depending on the size of your database this may take a while!
...collations up-to-date.
...have el_index_60 field in externallinks table.
...ug_user_group key doesn't exist.
...have ug_expiry field in user_groups table.
...img_media_type in table image already modified by patch patch-add-3d.sql.
[cut]
...have el_owner field in securepoll_elections table.
...abuse_filter table already exists.
...abuse_filter_log table does not contain afl_log_id field.
...have afl_filter_id field in abuse_filter_log table.
...skipping: index ip_timestamp doesn't exist.
...index afl_wiki_timestamp already set on abuse_filter_log table.
...skipping: index filter_timestamp doesn't exist.
...abuse_filter_log table does not contain afl_filter field.
/usr/local/bin/mwscript: line 26: 30052 Segmentation fault      sudo -u "$MEDIAWIKI_WEB_USER" $PHP "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"

I tried (and failed) to attach a debugger to the process:

taavi@deployment-deploy01:~$ sudo -u www-data gdb
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/bin/php7.2
Reading symbols from /usr/bin/php7.2...(no debugging symbols found)...done.
(gdb) run /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki aawiki --quick
Starting program: /usr/bin/php7.2 /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki aawiki --quick
This account is currently not available.
During startup program exited with code 1.
(gdb) quit

It fails on the buster-based deployment-deploy03 host too.

cc @Legoktm for the vips findings above, no unattended-updates on production but letting you know just in case

Running update.php from the command line, the script stalls after the line:

deployment-deploy01.deployment-prep.eqiad.wmflabs
$ mwscript update.php --wiki=aawiki --quick
...
...abuse_filter_log table does not contain afl_filter field.
/usr/local/bin/mwscript: line 26: 17236 Segmentation fault      sudo -u "$MEDIAWIKI_WEB_USER" $PHP "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"

I have tried gdb with:

$ sudo  -s -u www-data gdb --args /usr/bin/php /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki=aawiki --quick
...

Thread 1 "php" received signal SIGSEGV, Segmentation fault.
0x000055555578b04c in ?? ()
(gdb) bt
#0  0x000055555578b04c in ?? ()
#1  0x000055555578c5bc in ?? ()
#2  0x000055555578c68b in ap_php_slprintf ()
#3  0x00005555556405aa in ?? ()
#4  0x0000555555641d1a in ?? ()
#5  0x00007fffe17107f5 in tideways_xhprof_execute_internal ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#6  0x000055555589f54d in execute_ex ()
#7  0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#8  0x00005555558a0084 in execute_ex ()
#9  0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#10 0x00005555558a0084 in execute_ex ()
#11 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#12 0x00005555558a0084 in execute_ex ()
#13 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#14 0x00005555558a0084 in execute_ex ()
#15 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#16 0x00005555558a0084 in execute_ex ()
#17 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#18 0x00005555558a0084 in execute_ex ()
#19 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#20 0x00005555558a05b4 in execute_ex ()
#21 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#22 0x00005555558a0084 in execute_ex ()
#23 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#24 0x00005555558a05b4 in execute_ex ()
#25 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#26 0x00005555558a05b4 in execute_ex ()
#27 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#28 0x00005555558a0084 in execute_ex ()
#29 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#30 0x00005555558a0084 in execute_ex ()
#31 0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#32 0x00005555558a0084 in execute_ex ()
...

That goes on and on and on :-\

Maybe it is due to some libs being removed, or it can be some changes made to a mediawiki extension which somehow would produce an infinite loop.

hashar triaged this task as Unbreak Now! priority.Nov 26 2021, 4:54 PM

That might be related to the infinite loop listed at T296508#7531413

I have marked it a train blocker thus it is now unbreak now priority.

@Majavah sorry I have missed your gdb comment cause I was editing the task at the same time you have send your message. The trick I found for the www-data account being disabled has been to use to use sudo -s (to get a shell and workaround the shell being set to nothing I guess).

Majavah claimed this task.