Page MenuHomePhabricator

CirrusSearch logging logs with reserved parameter message
Closed, ResolvedPublic

Description

Seemingly this code ends up passing in a 'message' parameter as $context

		LoggerFactory::getInstance( 'CirrusSearch' )->warning(
			"Search backend error during {$logType} after {tookMs}: {error_message}",
			$context
		);

causes

{
  "_index": "logstash-mediawiki-2020.02.14",
  "_type": "mediawiki",
  "_id": "AXBFZX2Gh3Uj6x1zEFHZ",
  "_version": 1,
  "_score": null,
  "_source": {
    "server": "uk.wikipedia.org",
    "phpversion": "7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1",
    "wiki": "ukwiki",
    "channel": "CirrusSearch",
    "program": "mediawiki",
    "hitsOffset": 0,
    "type": "mediawiki",
    "http_method": "GET",
    "host": "mw1339",
    "@version": 1,
    "limit": 4,
    "c_message": "<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.13.9</center>\r\n</body>\r\n</html>\r\n",
    "shard": "s7",
    "timestamp": "2020-02-14T20:30:40+00:00",
    "tookMs": 15,
    "severity": "warning",
    "error_message": "unknown: Status code 502; <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.13.9</center>\r\n</body>\r\n</html>\r\n",
    "unique_id": "XkcDcApAIDMAAK0HejYAAABE",
    "logstash_formatter_key_conflict": [
      "message"
    ],
    "level": "WARNING",
    "ip": "10.64.32.51",
    "query": "morelike:Кузьо_Тарас",
    "index": "ukwiki_content",
    "mwversion": "1.35.0-wmf.18",
    "logsource": "mw1339",
    "message": "Search backend error during more_like search for 'morelike:Кузьо_Тарас' after 15: unknown: Status code 502; <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.13.9</center>\r\n</body>\r\n</html>\r\n",
    "normalized_message": "Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message}",
    "url": "/w/api.php?action=query&format=json&formatversion=2&prop=pageimages%7Cdescription&piprop=thumbnail&pithumbsize=160&pilimit=3&generator=search&gsrsearch=morelike%3A%D0%9A%D1%83%D0%B7%D1%8C%D0%BE_%D0%A2%D0%B0%D1%80%D0%B0%D1%81&gsrnamespace=0&gsrlimit=3&gsrqiprofile=classic_noboostlinks&uselang=content&smaxage=86400&maxage=86400",
    "reqId": "XkcDcApAIDMAAK0HejYAAABE",
    "queryType": "more_like",
    "tags": [
      "input-kafka-rsyslog-udp-localhost",
      "rsyslog-udp-localhost",
      "kafka",
      "es"
    ],
    "referrer": "https://uk.m.wikipedia.org/wiki/%D0%9A%D1%83%D0%B7%D1%8C%D0%BE_%D0%A2%D0%B0%D1%80%D0%B0%D1%81",
    "@timestamp": "2020-02-14T20:30:40.547Z",
    "syntax": [
      "more_like",
      "full_text",
      "filter_only"
    ],
    "suggestionRequested": false,
    "facility": "user"
  },
  "fields": {
    "@timestamp": [
      1581712240547
    ]
  },
  "sort": [
    1581712240547
  ]
}

And specifically,

"logstash_formatter_key_conflict": [
  "message"
],
"c_message": "<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.13.9</center>\r\n</body>\r\n</html>\r\n",

Looking at ArrayLogger.php, this code seems suspicious, but I'm not sure if it's this (or only this)

	public function log( $level, $message, array $context = [] ) {
		$this->logs[] = [
			'level' => $level,
			'message' => $message,
			'context' => $context,
		];
	}

Also in MultiSearchRequestLog.php and SearchRequestLog.php

		// in case of failures from Elastica
		if ( isset( $responseData['message'] ) ) {
			$vars['message'] = $responseData['message'];
		}

And seemingly from RequestLogger.php (anything that calls/uses RequestLogger::buildLogMessage), are having a conflict with the message key

			$logMessage = $this->buildLogMessage( $log, $finalContext );
			LoggerFactory::getInstance( 'CirrusSearchRequests' )->debug( $logMessage, $finalContext );
			if ( $slowMillis && $log->getTookMs() >= $slowMillis ) {
				if ( $user !== null ) {
					$finalContext['user'] = $user->getName();
					$logMessage .= ' for {user}';
				}
				LoggerFactory::getInstance( 'CirrusSearchSlowRequests' )->info( $logMessage, $finalContext );
			}

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : wmf/1.35.0-wmf.22Don't log with reserved parameter
mediawiki/extensions/CirrusSearch : masterDon't log with reserved parameter

Event Timeline

Reedy created this task.Feb 14 2020, 8:37 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 14 2020, 8:37 PM
Restricted Application added a subscriber: Base. · View Herald Transcript
Reedy updated the task description. (Show Details)Feb 14 2020, 8:38 PM
Reedy updated the task description. (Show Details)Feb 14 2020, 9:05 PM
Reedy updated the task description. (Show Details)Feb 28 2020, 4:59 PM

The problem is the message key, or the c_message key (or both)?. It looks like neither is part of our structured logging schema, so they should be relatively easy to change.

Reedy added a comment.Mar 3 2020, 7:16 PM

The problem is the message key, or the c_message key (or both)?. It looks like neither is part of our structured logging schema, so they should be relatively easy to change.

message is the issue. MW renames it to c_message to stop it conflicting and adds the below logstash_formatter_key_conflict to identify that it's renamed the key

"logstash_formatter_key_conflict": [
  "message"
],
EBernhardson triaged this task as Medium priority.Mar 3 2020, 9:10 PM
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board.

Change 576960 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Don't log with reserved parameter

https://gerrit.wikimedia.org/r/576960

Thanks for digging through and providing the appropriate places. I did a little more poking around, but it looks like the (Multi)?RequestLogger is the only thing that needs changing. ArrayLogger is only used in the test suite as a replacement inspectable logging implementation.

Change 576960 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Don't log with reserved parameter

https://gerrit.wikimedia.org/r/576960

Change 576989 had a related patch set uploaded (by Reedy; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.35.0-wmf.22] Don't log with reserved parameter

https://gerrit.wikimedia.org/r/576989

Reedy added a comment.Mar 4 2020, 10:48 PM

Thanks for digging through and providing the appropriate places. I did a little more poking around, but it looks like the (Multi)?RequestLogger is the only thing that needs changing. ArrayLogger is only used in the test suite as a replacement inspectable logging implementation.

Fair enough, thanks for the fix.

I've just cherry picked it to .22, and will deploy it to help clean up the logs in .22, and we can keep an eye on it and confirm it fixes them all up :)

Change 576989 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.35.0-wmf.22] Don't log with reserved parameter

https://gerrit.wikimedia.org/r/576989

Mentioned in SAL (#wikimedia-operations) [2020-03-04T23:26:26Z] <reedy@deploy1001> Synchronized php-1.35.0-wmf.22/extensions/CirrusSearch/includes/: T245303 (duration: 01m 02s)

Reedy closed this task as Resolved.Mar 5 2020, 6:47 PM

Looks to have done the trick. Thanks Erik! :)