Page MenuHomePhabricator

Rethink how metric label values are sanitized
Open, Needs TriagePublic

Description

Currently, all metric names, label names, and label values run through the same sanitizer at declaration time. Prometheus and Graphite support different character sets in their metric definitions:

https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

  • Prometheus Metric Names MUST match [a-zA-Z_:][a-zA-Z0-9_:]*
  • Prometheus Metric Label Names MUST match [a-zA-Z_][a-zA-Z0-9_]*
  • Prometheus Metric Label Values MAY contain any unicode chars. Empty labels are dropped.

Graphite/Whisper is limited by whatever filesystem is being written to with the expectation that the separator "." is the directory separator. Because of this, the allowed character set was limited since long ago:

BufferingStatsdDataFactory.php
private static function normalizeMetricKey( $key ) {
	$key = strtr( $key, [ '::' => '.' ] );
	$key = preg_replace( '/[^a-zA-Z0-9.]+/', '_', $key );
	$key = trim( $key, '_.' );
	return strtr( $key, [ '..' => '.' ] );
}
StatsUtils.php
public static function normalizeString( string $entity ): string {
	$entity = preg_replace( "/[^a-z0-9]/i", "_", $entity );
	$entity = preg_replace( "/_+/", "_", $entity );
	return trim( $entity, "_" );
}

Given that Graphite metrics are less defined, I propose we make the following changes:

  1. Strictly control label values that expect to go to Graphite
  2. Loosen control of label values that expect to go to Prometheus

This change is necessary to complete T348796 because Prometheus histograms expect values which would be sanitized, e.g. le=+Inf and le=0.5.