Page MenuHomePhabricator

Create sniff to help prevent logstash_formatter_key_conflict in mediawiki logs
Open, Needs TriagePublic

Description

Following up from T245280, it'd be useful if we could try and prevent some of these with PHPCS rules

The simplest case, is the existence of things like {host}, {type}, {ip} in logging strings points to this being wrong

From LogstashFormatter:

	/** @var array Keys which should not be used in log context */
	protected $reservedKeys = [
		// from LogstashFormatter
		'message', 'channel', 'level', 'type',
		// from WebProcessor
		'url', 'ip', 'http_method', 'server', 'referrer',
		// from WikiProcessor
		'host', 'wiki', 'reqId', 'mwversion',
		// from config magic
		'normalized_message',
	];

For example from MediaWiki-extensions-LoginNotify

		$this->log->info( 'Sending a {type} notification to {user}',
			[
				'function' => __METHOD__,
				'type' => $type,
				'user' => $user->getName(),
			]
		);

It's obvious from the string (based on the list above) that {type} is wrong, and should be easier to detect

In a case like the below (made up, but there are cases where log params are in the array, but not in the string), it's harder... But if we were looking for functions named info, debug etc

		$this->log->info( 'Sending a notification to {user}',
			[
				'function' => __METHOD__,
				'type' => $type,
				'user' => $user->getName(),
			]
		);

Based on the reserved word list, as of writing, a regex catching \{(message|channel|level|type|url|ip|http_method|server|referrer|host|wiki|reqId|mwversion|normalized_message)\} in a string would probably help

I note this isn't probably auto fixable, but should be trivially doable by a human...

Event Timeline

Reedy created this task.Feb 14 2020, 6:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 14 2020, 6:15 PM
Reedy added a comment.Mar 14 2020, 4:33 AM

T242751: Update monolog/monolog to 2.1.1 or later may make this obsolete if the monolog 2 upgrade prevents this from happening...

Legoktm edited projects, added phan; removed MediaWiki-Codesniffer.Mar 14 2020, 5:38 AM
Legoktm added a subscriber: Legoktm.

I think this is better done in phan which has better inferences on array keys, etc.

I think this is better done in phan which has better inferences on array keys, etc.

We have the usual pros and cons... Phan can certainly understand array shapes much better than PHPCS, and be much more precise. OTOH, PHPCS runs on more repos, is easier to configure, and it used to be easier to run until recent times.

If we want to do this, I'd also vote for phan.

Yea, please go for Phan. A PHPCS sniff for this will most probably be tremendously complicated, and still miss many places. For example, how should PHPCS know when $this->logger actually is a LoggerInterface, and not something else? How should it check the $context array in $this->logger->info( '…', $context );?