Stack for shutdown/destruct fatals missing from php7-fatal-error.php logs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Dec 19 2019, 12:55 AM

Description

Example event:
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2019.12.18/mediawiki/?id=AW8bbGJAarkxubcmBO13

type:mediawiki channel:fatal
caught_by: /etc/php/php7-fatal-error.php (via wmerrors)
exception.file: /srv/mediawiki/wmf-config/set-time-limit.php:39

exception.message:
Uncaught WMFTimeoutException: the execution time limit of 60 seconds was exceeded in /srv/mediawiki/wmf-config/set-time-limit.php:39
Stack trace:
#0 /srv/mediawiki/php-1.35.0-wmf.10/includes/parser/Parser.php(414): {closure}(1)
#1 [internal function]: Parser->__destruct()
#2 {main}
  thrown

exception.trace:

message:
Uncaught WMFTimeoutException: the execution time limit of 60 seconds was exceeded in /srv/mediawiki/wmf-config/set-time-limit.php:39
Stack trace:
#0 /srv/mediawiki/php-1.35.0-wmf.10/includes/parser/Parser.php(414): {closure}(1)
#1 [internal function]: Parser->__destruct()
#2 {main}
  thrown

Notice how exception.trace is empty. And, instead it is included (but cut off) from the exception.message.

Looking at the php-fpm stderr I find for the same host/timestamp this event:
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2019.12.18/syslog/?id=AW8bbGJAarkxubcmBO15

program :php-fpm
message:
PHP Fatal error:  Uncaught WMFTimeoutException: the execution time limit of 60 seconds was exceeded in /srv/mediawiki/wmf-config/set-time-limit.php:39
Stack trace:
#0 /srv/mediawiki/php-1.35.0-wmf.10/includes/parser/Parser.php(414): {closure}(1)
#1 [internal function]: Parser->__destruct()
#2 {main}
  thrown in /srv/mediawiki/wmf-config/set-time-limit.php on line 39

Looking at Logstash more generally it seems that for shutdown/destruct stacks, this is always the case. And for others, it is correctly reported through exception.trace. This is an issue for two reasons:

As part of the message it's repeated three times (approximating overall limit, risking getting trimmed). And even within its own field, risks getting trimmed, as already seen above even for the relatively small stack. This might be caused by other reasons as well, not sure yet.
Due to not being in exception.trace it means product and/or team specific Logstash dashboards never see these failures (given we filter by trace).

Details

	Subject	Repo	Branch	Lines +/-
	mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php	operations/puppet	production	+71 -24

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Krinkle	T241097 Stack for shutdown/destruct fatals missing from php7-fatal-error.php logs
Resolved	herron	T242770 Logstash for MediaWiki is down in Beta Cluster
Resolved	Dzahn	T242658 Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement)
Resolved	Krinkle	T242659 Syslog on beta cluster app server (deployment-mediawiki-07) is empty and last touched Jan 8

Event Timeline

Krinkle created this task.Dec 19 2019, 12:55 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 19 2019, 12:55 AM

Change 559262 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php

https://gerrit.wikimedia.org/r/559262

gerritbot added a project: Patch-For-Review.Dec 19 2019, 1:08 AM

• Gilles assigned this task to aaron.Jan 6 2020, 10:14 PM

• Gilles moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.

@Joe @jijiki Would like to deploy this soon. Ping me any time this week if you can squeeze it in so I can stand by for confirmation.

EDIT: Wanted to test in Beta first but can't due to T242658. Once that resolves, I'll test there and then ping back here.

Krinkle claimed this task.Jan 13 2020, 8:50 PM

Krinkle added a subscriber: aaron.

@Krinkle, I am fine to merge any time, as soon as you have tested it:)

Now blocked by T242770: Logstash for MediaWiki is down in Beta Cluster.

Krinkle added a subtask: T242770: Logstash for MediaWiki is down in Beta Cluster.Jan 14 2020, 5:21 PM

Krinkle added a subtask: T242658: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement).

Krinkle changed the task status from Open to Stalled.Jan 14 2020, 5:25 PM

Krinkle triaged this task as Medium priority.

Krinkle moved this task from Doing (old) to Blocked (old) on the Performance-Team board.

Krinkle closed subtask T242770: Logstash for MediaWiki is down in Beta Cluster as Resolved.Jan 14 2020, 5:54 PM

Krinkle added a subtask: T242659: Syslog on beta cluster app server (deployment-mediawiki-07) is empty and last touched Jan 8.Jan 14 2020, 7:43 PM

Change 559262 merged by Effie Mouzeli:
[operations/puppet@production] mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php

https://gerrit.wikimedia.org/r/559262

Maintenance_bot removed a project: Patch-For-Review.Jan 16 2020, 6:11 PM

Krinkle closed this task as Resolved.Jan 16 2020, 6:20 PM

Stack for shutdown/destruct fatals missing from php7-fatal-error.php logsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Stack for shutdown/destruct fatals missing from php7-fatal-error.php logs
Closed, ResolvedPublic
Actions

Related Objects
Search...