Page MenuHomePhabricator

MediaWiki log spam during row D blip / rack D2 unavailable
Closed, DeclinedPublic

Description

[Related but different than T233623: MediaWiki log spam during master unavailable / read only, this one is about spam during https://wikitech.wikimedia.org/wiki/Incident_documentation/20190920-d2_switch_failure]

We've observed a large amount of logs sent by MW during the incident above, namely a peak of ~130k msg/s (see below) with the biggest chuck being from memcache channel:

e.g. Memcached error for key "WANCache:t:ptwiki:messages:pt" on server "127.0.0.1:11213": SERVER ERROR (https://logstash.wikimedia.org/goto/8c177e65b10582aaf15a904b91977751)

while the rest was mostly dbconnection / dbreplication channels: https://logstash.wikimedia.org/goto/006c2fa6f31a1d6b7138e673cb077a20

There should be some logging rate limit (per mw channel + level?) so that similar situations are handled more gracefully

2019-09-24-183428_1075x1185_scrot.png (1×1 px, 192 KB)

Event Timeline

Change 538931 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: throttle duplicate normalized_message with level:ERR*

https://gerrit.wikimedia.org/r/538931

Another type of spam that we have observed, php-generated but comes through by means of syslog + apache: https://logstash.wikimedia.org/goto/c5afb82f07a0249524d66b74c82e55c4

2019-09-24T11:35:36	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321
2019-09-24T11:35:35	mw1341	NOTICE	PHP Notice:  Undefined variable: wmgFileExporterBetaFeature in /srv/mediawiki/wmf-config/CommonSettings.php on line 3321

Change 538931 merged by Herron:
[operations/puppet@production] logstash: throttle duplicate normalized_message with level:ERR*

https://gerrit.wikimedia.org/r/538931

Change 540465 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: add id to drop action in filter-throttle-errors

https://gerrit.wikimedia.org/r/540465

Change 540465 merged by Herron:
[operations/puppet@production] logstash: add id to drop action in filter-throttle-errors

https://gerrit.wikimedia.org/r/540465

lmata subscribed.

Adding back Observability-Logging which is a component tag within SRE Observability; when this gets selected, SRE Observability gets untagged. This keeps the intake tag less crowded and separate from the backlog and triaged flows.

colewhite subscribed.

Boldly declining in favor of focusing effort on T331879: Investigate methods to rate-limit/discard excessive log messages closer to the producer. Feel free to reopen if this is in error.