Page MenuHomePhabricator

PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002)
Open, NormalPublic

Description

This error show up in Logstash whenever I'm using PHP 7 via X-Wikimedia-Debug

PHP Fatal error:  Uncaught LogicException: The UdpSocket to 127.0.0.1:10514 has been closed and can not be written to anymore in /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Handler/SyslogUdp/UdpSocket.php:45
Stack trace:
#0 /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Handler/SyslogUdp/UdpSocket.php(31): Monolog\Handler\SyslogUdp\UdpSocket->send('<15>Jan 25 21:3...')
#1 /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Handler/SyslogUdpHandler.php(47): Monolog\Handler\SyslogUdp\UdpSocket->write('@cee: {"@timest...', '<15>Jan 25 21:3...')
#2 /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Handler/AbstractProcessingHandler.php(37): Monolog\Handler\SyslogUdpHandler->write(Array)
#3 /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Logger.php(337): Monolog\Handler\AbstractProcessingHandler->handle(Array)
#4 /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Logger.php(532): Monolog\Logger->addRecord(100, 'Wikimed in /srv/mediawiki/php-1.33.0-wmf.14/vendor/monolog/monolog/src/Monolog/Handler/SyslogUdp/UdpSocket.php on line 45
typesyslog
hostmwdebug1002
programphp7.2-fpm

Note that these are logged directly from php, and not via MediaWiki's fatal error handler. As such, they don't have any context about which wiki, or which url it happened with.

Also, I haven't seen my web requests aborted with this error. This suggests that maybe this happening post-send.

Details

Related Gerrit Patches:
operations/puppet : productiontrafficserver: re-enable mwdebug1002

Event Timeline

Krinkle created this task.Jan 25 2019, 9:38 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 25 2019, 9:38 PM
Joe triaged this task as High priority.Feb 7 2019, 12:47 PM

Is it sporadic or reproducible at will? My first thought was rsyslog not running (or restarting) and thus 10514 udp not open

fgiunchedi moved this task from Backlog to Radar on the User-fgiunchedi board.Apr 23 2019, 12:29 PM
Krinkle lowered the priority of this task from High to Low.May 7 2019, 4:21 PM
Krinkle added a subscriber: Joe.

Didn't realise it was limited to mwdebug1002. Never checked other servers but @Joe is right that it only happens there. Let's ignore for now.

Krinkle renamed this task from PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) to PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002).May 7 2019, 4:21 PM
Krinkle moved this task from Untriaged to Meta on the Wikimedia-production-error board.
Krinkle removed a subscriber: Krinkle.May 7 2019, 9:56 PM
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM
Krinkle added a comment.EditedSep 11 2019, 6:38 PM

Still seen regularly. It's somewhat concerning me because we rely on mwdebug1002 to find non-fatal warnings and errors when testing patches and expect that if Logstash/mwdebug1002 is clear, that means the patch is good to go.

If we instead see "The UdpSocket to 127.0.0.1:10514 has been closed and can not be written to" that is first of all confusing to anyone who doesn't know it's specific to mwdebug1002. But more importantly - does this not mean that some messages were dropped and not reported to Logstash?

That's more concerning because it means we're missing messages that could very well be errors.

@Joe If this is specific to mwdebug1002, perhaps re-imaging the server could help to rule out anything that might've gotten out of whack with this one?


EDIT: might be related to T151428: Error when provisioning mediawiki-vagrant: "The UdpSocket to 127.0.0.1:10514 has been closed and can not be written to anymore"

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.EditedOct 12 2019, 11:19 PM

Tagging RelEng for awareness as this means when looking at Logstash/mwdebug1002 during deployments, there may be critical warnings not visible due to them not making it from MW to syslog/Kafka.

Krinkle raised the priority of this task from Low to Normal.Oct 12 2019, 11:19 PM
jijiki added a subscriber: jijiki.Oct 15 2019, 9:32 AM

Server will be reimaged, I will ping here when it is done :)

Mentioned in SAL (#wikimedia-operations) [2019-10-23T12:37:26Z] <effie> Depool mwdebug1002 - T214734

Mentioned in SAL (#wikimedia-operations) [2019-10-23T13:34:01Z] <effie> disable puppet on mwdebug1002 - T214734

ema added a subscriber: ema.EditedFri, Nov 8, 5:01 PM

Note that debug servers aren't pooled in etcd like regular production ones, so mwdebug1002 is still serving debug traffic:

$ curl -v -H "X-Wikimedia-Debug: backend=mwdebug1002.eqiad.wmnet" https://en.wikipedia.org/wiki/Main_Page 2>&1 | grep server:
< server: mwdebug1002.eqiad.wmnet

We could temporarily comment it out from debug_proxy's configuration (modules/role/manifests/debug_proxy.pp), which would result in requests with mwdebug1002 in X-W-D getting a 400 error.

Mentioned in SAL (#wikimedia-operations) [2019-11-12T12:11:27Z] <effie> Reimage mwdebug1002 - T214734

jijiki added a subscriber: Krinkle.Tue, Nov 12, 2:49 PM

@Krinkle Please let me know if the issue on mwdebug1002 persists, as the server has been reimaged :)

Change 551515 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] trafficserver: re-enable mwdebug1002

https://gerrit.wikimedia.org/r/551515

Change 551515 merged by Ema:
[operations/puppet@production] trafficserver: re-enable mwdebug1002

https://gerrit.wikimedia.org/r/551515

Looks like it is still happening:

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2019.11.20/syslog?id=AW6KkyhearkxubcmwWlL&_g=h@44136fa

host: mwdebug1002
type: syslog
program:	php7.2-fpm
message:

PHP Fatal error:  Uncaught LogicException: The UdpSocket to 127.0.0.1:10514 has been closed and can not be written to anymore in /srv/mediawiki/php-1.35.0-wmf.5/vendor/monolog/monolog/src/Monolog/Handler/SyslogUdp/UdpSocket.php:45

Steps to reproduce the issue:

  1. Enable WikimediaDebug to mwdebug1002 with Logging enabled.
  2. Disable WikimediaDebug (because T237687#5679746).
  3. Go to https://logstash.wikimedia.org/app/kibana#/dashboard/mwdebug1002