Page MenuHomePhabricator

Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal
Closed, ResolvedPublic

Description

For HHVM, the hhvm-fatal-error.php entries to syslog were rewritten to appear as MediaWiki. But the PHP7 equivalent appears to not be doing that yet or not anymore.

This means they can be included in Logstash dashboards that include a type:mediawiki filter (where developers will discover it naturally as part of routine monitoring, and also during train/SWAT deployments). Using a different type is not feasible even as workaround because Kibana doesn't support that unless you bypass the UI and use JSON queries directly.

If I recall correctly, this rewrite would be in the puppetised logstash/filter file.


[…] After looking further into this, I think we're actually counting these errors multiple times - see for instance https://logstash.wikimedia.org/goto/c92404ce11520d59feb9211471d5bf09

One is fpm stderr, which is not useful (no stack traces, no rich MW context fields). This is similar to the hhvm syslog we've had in the past, and our custom syslog rewriter for that.
The one from fatal-error.php should appear under type:mediawiki channel:fatal instead of type:syslog. Right now these are not being found by developers and most monitoring queries. Classifying as high priority because this means every Scap deploy and train promotion may be acting on false confidence – given 100% PHP 7 traffic.


Subtasks
  1. Hotfix the Scap canary logstash-checker to include syslog/php7-fpm.
  2. Fix sending of, or Logstash ingestion of, PHP7 fatal-error messages to correctly arrive under type:mediawiki channel:fatal.

Details

Related Gerrit Patches:

Event Timeline

Krinkle created this task.Oct 1 2019, 2:04 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2019, 2:04 AM

Change 539881 had a related patch set uploaded (by Krinkle; owner: Thcipriani):
[operations/puppet@production] scap: mediawiki logstash_checker

https://gerrit.wikimedia.org/r/539881

Krinkle added subscribers: herron, dduvall, thcipriani.
Krinkle triaged this task as High priority.Oct 11 2019, 5:51 PM
Krinkle updated the task description. (Show Details)Oct 11 2019, 5:55 PM
Krinkle updated the task description. (Show Details)Oct 12 2019, 4:44 PM
Krinkle updated the task description. (Show Details)Oct 15 2019, 8:52 AM

Change 539881 merged by Effie Mouzeli:
[operations/puppet@production] scap: mediawiki logstash_checker

https://gerrit.wikimedia.org/r/539881

jijiki added a subscriber: jijiki.Oct 15 2019, 9:54 AM
Krinkle updated the task description. (Show Details)Oct 15 2019, 6:33 PM

I will take a look tomorrow, sorry for delaying this

jijiki claimed this task.Thu, Oct 24, 3:22 PM

Change 546219 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] logstash: send PHP7 fatal-error messages type:mediawiki channel:fatal

https://gerrit.wikimedia.org/r/546219

Change 546230 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Add 'caught_by' to php7-fatal-error log message

https://gerrit.wikimedia.org/r/546230

Joe added a subscriber: fgiunchedi.Wed, Oct 30, 8:19 AM

In the case of HHVM, fatals were correctly handled by the daemon and used whatever callback MediaWiki has to manage errors, so hhvm-fatal-error.php has nothing regarding logging, except for sending a counter to statsd:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/b999b4cba2ec27415d329a62277cdd64f24b440b/modules/mediawiki/templates/hhvm-fatal-error.php.erb

While for MediaWiki, we send the messages to syslog via the @cee format:

https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/files/mediawiki/php/php7-fatal-error.php#107

so it's normal and expected that such logs will end up having type:syslog.

We need to do one of the following:

  • Send these traces to logstash directly adding the correct channel/type
  • Add some information to the log line we send to syslog so we can write either syslog rules or logstash rules to add the correct type/channel

if possible, I'd go with the latter.

@fgiunchedi is there a way to use syslog to relay messages to logstash and change the type of the message /before/ we send it to logstash? Or we need to do that at processing time on logstash?

We need to do one of the following:

  • Send these traces to logstash directly adding the correct channel/type
  • Add some information to the log line we send to syslog so we can write either syslog rules or logstash rules to add the correct type/channel

if possible, I'd go with the latter.
@fgiunchedi is there a way to use syslog to relay messages to logstash and change the type of the message /before/ we send it to logstash? Or we need to do that at processing time on logstash?

We could poke at rsyslog configuration, although I'd prefer to go the logstash route as the patch above does

Change 546219 merged by Effie Mouzeli:
[operations/puppet@production] logstash: send PHP7 fatal-error messages type:mediawiki channel:fatal

https://gerrit.wikimedia.org/r/546219

Change 546230 merged by Effie Mouzeli:
[operations/puppet@production] mediawiki: Add 'caught_by' to php7-fatal-error log message

https://gerrit.wikimedia.org/r/546230

Please ping if there are more things to be done for this task:)

Krinkle added a comment.EditedThu, Oct 31, 7:10 PM

Change 546219 merged by Effie Mouzeli:
[operations/puppet@production] logstash: send PHP7 fatal-error messages type:mediawiki channel:fatal
https://gerrit.wikimedia.org/r/546219

Has this been deployed? It appears to not be effective (yet), whereas the next commit (adding caught_by:wmerrors) is seen to be active in Logstash.

For example,
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2019.10.31/syslog?id=AW4jN9Bwx3rdj6D84WNm

caught_by	/etc/php/php7-fatal-error.php (via wmerrors)

exception.file	/srv/mediawiki/php-1.35.0-wmf.4/includes/json/FormatJson.php:127
exception.message	Allowed memory size of 692060160 bytes exhausted (tried to allocate 23072768 bytes)
exception.trace	#1  json_encode() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/json/FormatJson.php:127]
#2  FormatJson::encode() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/api/ApiFormatJson.php:90]
#3  ApiFormatJson->execute() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/api/ApiMain.php:1849]
#4  ApiMain->printResult() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/api/ApiMain.php:1611]
#5  ApiMain->executeAction() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/api/ApiMain.php:539]
#6  ApiMain->executeActionWithErrorHandling() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/api/ApiMain.php:510]
#7  ApiMain->execute() called at [/srv/mediawiki/php-1.35.0-wmf.4/api.php:83]
#8  require(/srv/mediawiki/php-1.35.0-wmf.4/api.php) called at [/srv/mediawiki/w/api.php:3]

host	mw1313
message	Allowed memory size of 692060160 bytes exhausted (tried to allocate 23072768 bytes)
normalized_message	Allowed memory size of 692060160 bytes exhausted (tried to allocate 23072768 bytes)

program	php7.2-fpm
type	syslog

level	ERR
…

@Krinkle we are looking into it, tx

Change 547654 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: select on severity and lowercase err

https://gerrit.wikimedia.org/r/547654

Change 547654 merged by Cwhite:
[operations/puppet@production] profile: select on severity and lowercase err

https://gerrit.wikimedia.org/r/547654

Looks like this is working now! cc @Krinkle

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2019.11.05/mediawiki?id=AW47SJgQx3rdj6D8jJMj&_g=h@44136fa

"_index": "logstash-deploy-2019.11.05",
"_type": "mediawiki",
"_id": "AW47SJgQx3rdj6D8jJMj",
"_version": 1,
"_score": null,
"_source": {
  "exception": {
    "trace": "#1  array_unshift() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:720]\n#2  Parsoid\\Wt2Html\\Grammar->a52() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:4485]\n#3  Parsoid\\Wt2Html\\Grammar->parsetable_attribute_name() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:3325]\n#4  Parsoid\\Wt2Html\\Grammar->parsetable_attribute() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:2376]\n#5  Parsoid\\Wt2Html\\Grammar->parsetable_attributes() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:2316]\n#6  Parsoid\\Wt2Html\\Grammar->parserow_syntax_table_args() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:13191]\n#7  Parsoid\\Wt2Html\\Grammar->parsetable_data_tag() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:12307]\n#8  Parsoid\\Wt2Html\\Grammar->parsetable_data_tags() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:10984]\n#9  Parsoid\\Wt2Html\\Grammar->parsetable_content_line() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:8516]\n#10 Parsoid\\Wt2Html\\Grammar->parsetable_line() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:10721]\n#11 Parsoid\\Wt2Html\\Grammar->parsehacky_dl_uses() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:8346]\n#12 Parsoid\\Wt2Html\\Grammar->parselist_item() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:6056]\n#13 Parsoid\\Wt2Html\\Grammar->parseblock_line() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:6230]\n#14 Parsoid\\Wt2Html\\Grammar->parseblock_lines() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:3772]\n#15 Parsoid\\Wt2Html\\Grammar->parseblock() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:2708]\n#16 Parsoid\\Wt2Html\\Grammar->parsetlb() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:1838]\n#17 Parsoid\\Wt2Html\\Grammar->streamstart_async() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/PegTokenizer.php:137]\n#18 Parsoid\\Wt2Html\\PegTokenizer->processChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/TokenTransformManager.php:190]\n#19 Parsoid\\Wt2Html\\TokenTransformManager->processChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/TokenTransformManager.php:190]\n#20 Parsoid\\Wt2Html\\TokenTransformManager->processChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/TokenTransformManager.php:190]\n#21 Parsoid\\Wt2Html\\TokenTransformManager->processChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/HTML5TreeBuilder.php:431]\n#22 Parsoid\\Wt2Html\\HTML5TreeBuilder->processChunkily()\n#23 Generator->current() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/DOMPostProcessor.php:895]\n#24 Parsoid\\Wt2Html\\DOMPostProcessor->processChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipeline.php:148]\n#25 Parsoid\\Wt2Html\\ParserPipeline->parseChunkily() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipeline.php:198]\n#26 Parsoid\\Wt2Html\\ParserPipeline->parseToplevelDoc() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipelineFactory.php:308]\n#27 Parsoid\\Wt2Html\\ParserPipelineFactory->parse() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/WikitextContentModelHandler.php:78]\n#28 Parsoid\\WikitextContentModelHandler->toHTML() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Parsoid.php:86]\n#29 Parsoid\\Parsoid->parseWikitext() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Parsoid.php:113]\n#30 Parsoid\\Parsoid->wikitext2html() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/extension/src/Rest/Handler/ParsoidHandler.php:543]\n#31 MWParsoid\\Rest\\Handler\\ParsoidHandler->wt2html() called at [/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/extension/src/Rest/Handler/PageHandler.php:55]\n#32 MWParsoid\\Rest\\Handler\\PageHandler->execute() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/Rest/Router.php:315]\n#33 MediaWiki\\Rest\\Router->executeHandler() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/Rest/Router.php:285]\n#34 MediaWiki\\Rest\\Router->execute() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/Rest/EntryPoint.php:116]\n#35 MediaWiki\\Rest\\EntryPoint->execute() called at [/srv/mediawiki/php-1.35.0-wmf.4/includes/Rest/EntryPoint.php:83]\n#36 MediaWiki\\Rest\\EntryPoint::main() called at [/srv/mediawiki/php-1.35.0-wmf.4/rest.php:31]\n#37 require(/srv/mediawiki/php-1.35.0-wmf.4/rest.php) called at [/srv/mediawiki/w/rest.php:3]\n",
    "file": "/srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/Grammar.php:720",
    "message": "Allowed memory size of 692060160 bytes exhausted (tried to allocate 20480 bytes)"
  },
  "server": "en.wikipedia.org",
  "level": "ERR",
  "phpversion": "7.2.22-1+0~20190902.26+debian9~1.gbpd64eb7+wmf1",
  "ip": "10.64.16.90",
  "channel": "fatal",
  "program": "php7.2-fpm",
  "logsource": "wtp1035",
  "type": "mediawiki",
  "message": "Allowed memory size of 692060160 bytes exhausted (tried to allocate 20480 bytes)",
  "caught_by": "/etc/php/php7-fatal-error.php (via wmerrors)",
  "url": "/w/rest.php/en.wikipedia.org/v3/page/pagebundle/List_of_Criterion_Collection_releases/924694089",
  "normalized_message": "Allowed memory size of 692060160 bytes exhausted (tried to allocate 20480 bytes)",
  "reqId": "XcFaQQpAEFoAAHbBJYwAAAAG",
  "tags": [
    "input-kafka-rsyslog-shipper",
    "rsyslog-shipper",
    "kafka",
    "syslog",
    "es",
    "es",
    "normalized_message_untrimmed"
  ],
  "http_method": "GET",
  "@timestamp": "2019-11-05T11:17:23.456Z",
  "host": "wtp1035",
  "@version": "1",
  "facility": "daemon",
  "timestamp": "2019-11-05T11:17:23.402304+00:00"
},
Krinkle closed this task as Resolved.Thu, Nov 7, 6:43 PM

Thanks, confirmed here as well.