Page MenuHomePhabricator

Create sniff to help prevent logstash_formatter_key_conflict in mediawiki logs
Open, Stalled, LowPublic

Description

Following up from T245280, it'd be useful if we could try and prevent some of these with PHPCS rules

The simplest case, is the existence of things like {host}, {type}, {ip} in logging strings points to this being wrong

From LogstashFormatter:

	/** @var array Keys which should not be used in log context */
	protected $reservedKeys = [
		// from LogstashFormatter
		'message', 'channel', 'level', 'type',
		// from WebProcessor
		'url', 'ip', 'http_method', 'server', 'referrer',
		// from WikiProcessor
		'host', 'wiki', 'reqId', 'mwversion',
		// from config magic
		'normalized_message',
	];

For example from MediaWiki-extensions-LoginNotify

		$this->log->info( 'Sending a {type} notification to {user}',
			[
				'function' => __METHOD__,
				'type' => $type,
				'user' => $user->getName(),
			]
		);

It's obvious from the string (based on the list above) that {type} is wrong, and should be easier to detect

In a case like the below (made up, but there are cases where log params are in the array, but not in the string), it's harder... But if we were looking for functions named info, debug etc

		$this->log->info( 'Sending a notification to {user}',
			[
				'function' => __METHOD__,
				'type' => $type,
				'user' => $user->getName(),
			]
		);

Based on the reserved word list, as of writing, a regex catching \{(message|channel|level|type|url|ip|http_method|server|referrer|host|wiki|reqId|mwversion|normalized_message)\} in a string would probably help

I note this isn't probably auto fixable, but should be trivially doable by a human...

Event Timeline

T242751: Update monolog/monolog to 2.1.1 or later may make this obsolete if the monolog 2 upgrade prevents this from happening...

Legoktm subscribed.

I think this is better done in phan which has better inferences on array keys, etc.

I think this is better done in phan which has better inferences on array keys, etc.

We have the usual pros and cons... Phan can certainly understand array shapes much better than PHPCS, and be much more precise. OTOH, PHPCS runs on more repos, is easier to configure, and it used to be easier to run until recent times.

If we want to do this, I'd also vote for phan.

Yea, please go for Phan. A PHPCS sniff for this will most probably be tremendously complicated, and still miss many places. For example, how should PHPCS know when $this->logger actually is a LoggerInterface, and not something else? How should it check the $context array in $this->logger->info( '…', $context );?

Krinkle changed the task status from Open to Stalled.Oct 14 2021, 8:26 PM
Krinkle triaged this task as Low priority.
Krinkle subscribed.

This will likely be obsoleted by T247675. Keeping open as stalled for now to reflect that this isn't being worked on, but open as reminder in case we change course.