Page MenuHomePhabricator

Decide on casing convention for JMX metrics in Prometheus
Closed, ResolvedPublic3 Estimated Story Points

Description

Allllright, time for a naming bike shed! My favorite thing!

We are in the process of using Prometheus JMX exporter for the new Kafka jumbo cluster. JMX has an option to snake case attribute names, e.g. turning FifteenMinuteRate into fifteen_minute_rate, but it does not have an option to snake case the metric mbean names. Instead, we can leave them as they are (camelCase), or we can lower case them in place. Example:

Here is an metric from Kafka
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent

Here is what it would look like when it is lower cased in prometheus.
kafka_server_kafkarequesthandlerpool_requesthandleravgidlepercent

This is what it would look like if we left it as is:
kafka_server_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent

We need to decide if we are going to auto-lower case all metrics from JMX exporter, OR if we are going to leave them as they are.

Cassandra is already lower casing them, but I suspect this was done just because why not (please correct me if there was reasoning behind it).

I'd much much strongly prefer to leave them as they are, in upper CamelCase. Reasons:

  1. Consistency. The metric names in prometheus then match their original names in JMX. Easier for grepping, etc.
  1. Readability. Simply lower casing the names loses the word separators. By lower casing, we choose to drop meaning. I find the smushed together words difficult to read. This could get worse in other cases, as java loves to be verbose and smush words together with capitals.

FYI, in general I much prefer snake_case over camelCase, but we aren't writing a coding style guide here, we are choosing whether or not we should transform upstream's names to all lower case (not snake_case) just because we don't like camelCase.

If I can't convince either Filippo OR Luca that we should not lower case, I'll deal with it, but my aesthetic sensibilities will be greatly offended whenever I look at a Kafka metrics :D :p

Event Timeline

Ottomata created this task.Sep 29 2017, 2:50 PM
Ottomata updated the task description. (Show Details)
elukey added a comment.Oct 2 2017, 9:30 AM

Opened https://github.com/prometheus/jmx_exporter/issues/193 to ask to upstream if snake case could be implemented for other fields of the mbeans other than the attributes.

Thanks @elukey and @Ottomata ! That would indeed be helpful to have and more readable too.

FWIW I don't feel particularly strongly one way or another, however when we settle on one convention I think the convention should be maintained across the fleet for all jmx_exporter we have. It shouldn't be a big deal to change it for cassandra at this early stage if we wish to do so (cc @Eevans @mobrovac).
The reason I'm insisting on consistency is that it is preferrable to agree on a convention sooner rather than later as changing the metric name will effectively "reset" that metric history (IOW the new and old metrics are two distinct metrics as far as prometheus is concerned)

Upstream author replied in https://github.com/prometheus/jmx_exporter/issues/193#issuecomment-333522441, says he thinks even the attr snake case option isn't a good idea. :D

Eevans added a comment.Oct 2 2017, 3:44 PM

Isn't snakecase the Prometheus convention? If we're concerned about consistency, aren't we concerned with these being consistent with metrics from other exporters?

Is readability, really a concern here? We used some sample config (from Prometheus upstream, IIRC) for Cassandra that rewrote the very verbose MBean names to something shorter / more concise; They are already not a straightforward match for the original names. I created some dashboards and didn't have any issues, and assume that no one else will ever be looking at them. Am I missing something?

We used some sample config (from Prometheus upstream, IIRC) for Cassandra that rewrote the very verbose MBean names to something shorter / more concise;

You can totally rewrite the metric name from JMX?

Isn't snakecase the Prometheus convention?

AFAIK, snakecase isn't an option for JMX metrics. It is either as they are (CamelCase) or to lower case. Snake case would insert underscores as word separators.

Eevans added a comment.Oct 2 2017, 4:23 PM

We used some sample config (from Prometheus upstream, IIRC) for Cassandra that rewrote the very verbose MBean names to something shorter / more concise;

You can totally rewrite the metric name from JMX?

https://github.com/wikimedia/puppet/blob/production/modules/cassandra/files/prometheus_jmx_exporter-3.x.yaml

Ah with fancy schmancy regex matching, with each regex actually matching words out of the Mbean.

We could do that, its just a lot more configuration. @elukey, whatchu think?

Ah with fancy schmancy regex matching, with each regex actually matching words out of the Mbean.

We could do that, its just a lot more configuration. @elukey, whatchu think?

Sure, let me know what scheme you'd like to see and then we can implement it :)

I personally don't see any issue with the current metric naming but I am totally ok in changing it to something more readable, but please no camel case :D

Phewwwwwww let's think about this a little more. If we are about to say that to use JMX Exporter, you must create custom metric names for all of your metrics in order to avoid keeping them in the Java camel case convention, we might be asking a lot. This makes the use of JMX exporter more difficult, as every metric would need to be carefully converted to something else. Lots of our kafka jmx metrics use wildcards to pull out the metric names. To convert each one, we'd have to explicitly declare each metric, rather than wildcarding them, no?

but please no camel case :D

@elukey, y u so against camel case? :) I also prefer snake_case to camelCase, but when we code in JVM, we stick to the conventions from the Java world. Why would we want to forcefully change those conventions now?

elukey added a comment.EditedOct 3 2017, 1:35 PM

Phewwwwwww let's think about this a little more. If we are about to say that to use JMX Exporter, you must create custom metric names for all of your metrics in order to avoid keeping them in the Java camel case convention, we might be asking a lot. This makes the use of JMX exporter more difficult, as every metric would need to be carefully converted to something else. Lots of our kafka jmx metrics use wildcards to pull out the metric names. To convert each one, we'd have to explicitly declare each metric, rather than wildcarding them, no?

IIUC yes :)

but please no camel case :D

@elukey, y u so against camel case? :) I also prefer snake_case to camelCase, but when we code in JVM, we stick to the conventions from the Java world. Why would we want to forcefully change those conventions now?

Because this is not Java, it is prometheus, and as far as I can see from this task there is not much agreement in using camel case no? :)

Gehel added a subscriber: Gehel.Oct 3 2017, 1:51 PM

In short: anything is fine by me except lower casing everything.

Since I was asked, I'll add my opinion. I don't really care about snake_case of CamelCase. Java is very much oriented toward CamelCase, but it is either keeping coherence with Prometheus or coherence with Java. I don't think coherence with Java is that much important in this case. As long as we keep a way to identify word boundaries (just converting everything to lower case will reduce readability too much).

Side node, the example given in the description look like we flatten the JMX hierarchy by concatenating ObjectName attributes. I have mostly no idea how Prometheus work, so that might be the way to go. But being able to query that hierarchical structure seems important to me.

Because this is not Java, it is prometheus

These are metrics from Java. Prometheus is monitoring the metrics in the JVM. Converting every single one loses the correlation from the original metric name in the JVM. I'm not totally opposed to doing so, it just sounds cumbersome, without any gain in clarity or meaning.

as far as I can see from this task there is not much agreement in using camel case

Ha, I haven't got that yet. I've only seen "in cassandra we just wanted shorter names" and "I don't like camelCase", and "snake_case is prometheus convention". (snake_case is not on option for JMX exporter)

dcausse added a subscriber: dcausse.Oct 3 2017, 2:05 PM
elukey added a comment.Oct 3 2017, 2:07 PM

Because this is not Java, it is prometheus

These are metrics from Java. Prometheus is monitoring the metrics in the JVM. Converting every single one loses the correlation from the original metric name in the JVM. I'm not totally opposed to doing so, it just sounds cumbersome, without any gain in clarity or meaning.

The source format of the metrics should be orthogonal from its final representation on the metric collector used, because otherwise we'd need to keep multiple standards, each one of them depending from the source of data. As long as the metric name represents the information that we want to visualize it should not carry, in my opinion, too many conventions from its source.

Having said that, I will agree on what the majority wants, not really opposed to camel case but mixing lower/snakecase with camel case makes my eyes cry a bit :)

Eevans added a comment.Oct 3 2017, 2:17 PM

as far as I can see from this task there is not much agreement in using camel case

Ha, I haven't got that yet. I've only seen "in cassandra we just wanted shorter names" and "I don't like camelCase", and "snake_case is prometheus convention". (snake_case is not on option for JMX exporter)

Cassandra's metrics at least are quite verbose in name (and contain a lot of redundancy). Removing that redundancy felt like a win to me, and the conversion to snake case makes them consistent with the result of the Prometheus universe.

Eevans added a comment.Oct 3 2017, 2:18 PM

Because this is not Java, it is prometheus

These are metrics from Java. Prometheus is monitoring the metrics in the JVM. Converting every single one loses the correlation from the original metric name in the JVM. I'm not totally opposed to doing so, it just sounds cumbersome, without any gain in clarity or meaning.

The source format of the metrics should be orthogonal from its final representation on the metric collector used, because otherwise we'd need to keep multiple standards, each one of them depending from the source of data. As long as the metric name represents the information that we want to visualize it should not carry, in my opinion, too many conventions from its source.

Yes, this. Exactly this.

Oook, so it sounds like we don't want just to lower, and both elukey and eevans don't want camelCase. So, the only option we are left with is to manually convert all metric names. :/ OOook!

elukey moved this task from Backlog to In Progress on the User-Elukey board.Oct 5 2017, 8:27 AM
Gehel added a comment.Oct 6 2017, 2:10 PM

I don't care all that much about the coherence of the metric names. I would be nice to have coherent metric names, but it is IMHO much more important to have an easy and robust way to add metrics.

I think whoever is doing the implementation should have the final word. And decide based on the simplicity of implementation. I don't think that browsing the metrics and finding the one we want is going to be a problem, whatever the naming convention we adopt.

elukey added a comment.Oct 6 2017, 2:23 PM

During the last chat with Filippo and Eric we analyzed the most horrible looking metric names for kafka:

kafka_controller_controllerchannelmanager_totalqueuesize_value 0.0
kafka_controller_controllerstats_autoleaderbalancerateandtimems_count 537.0
kafka_controller_controllerstats_controlledshutdownrateandtimems_count 0.0
kafka_controller_controllerstats_controllerchangerateandtimems_count 9.0
kafka_controller_controllerstats_isrchangerateandtimems_count 2.0
kafka_controller_controllerstats_leaderelectionrateandtimems_count 1.0
kafka_controller_controllerstats_manualleaderbalancerateandtimems_count 0.0
kafka_controller_controllerstats_partitionreassignmentrateandtimems_count 0.0
kafka_controller_controllerstats_topicchangerateandtimems_count 0.0
kafka_controller_controllerstats_topicdeletionrateandtimems_count 0.0
kafka_controller_controllerstats_uncleanleaderelections_total 0.0
kafka_controller_kafkacontroller_activecontrollercount_value 1.0
kafka_controller_kafkacontroller_controllerstate_value 0.0
kafka_controller_kafkacontroller_offlinepartitionscount_value 0.0
kafka_controller_kafkacontroller_preferredreplicaimbalancecount_value 0.0

Now in this case a compromise could be to keep the lowercase but:

  1. remove the _controllerstats_ substring since it is probably not meaningful.
  2. keep the rest and deal with words like manualleaderbalancerateandtimems, or just replace them with something more coincise in the prometheus jmx rules.

Doing 2) needs to be done carefully because it is really easy to make the prometheus jmx exporter rules list too big, so we should strive for a compromise between readability and efficiency.

manualleaderbalancerateandtimems

Cough cough, typo that is hard to see because everything is smushed together and you don't know where words begin and end! :o :)

The source format of the metrics should be orthogonal from its final representation on the metric collector used, because otherwise we'd need to keep multiple standards, each one of them depending from the source of data. As long as the metric name represents the information that we want to visualize it should not carry, in my opinion, too many conventions from its source.

(Kinda want to respond to this, even though I can feel I'm losing this battle!) In general, I agree with you completely. But we don't have a lot of options here, so we have to work with what we have. If it were simple and obvious on how to nicely transform JMX metric names to some single Prometheus convention, than I would be on board. But there isn't, and I don't think we should have to work so hard to get the JMX metrics.

In my ideal world, to get JMX metrics for any JVM, I'd just include a precanned Puppet class and have it all work, without having to even configure any metrics manually. Since we clearly can't do that and will need some configuration, we should keep the configuration as small as possible. If I was setting up monitoring for a new JVM service, and I had to manually list every JMX metric I wanted, I would likely only include the most obvious ones I need at that moment. If there was a precanned or conventional way to get all or many metrics by default, then I'd choose to use it.

Forcing people to work harder to get metrics will result in fewer things being monitored.

I would be nice to have coherent metric names, but it is IMHO much more important to have an easy and robust way to add metrics.

I agree with this.

a compromise could be to keep the lowercase but...

So, the 'convention' that is being proposed here is: you must use lowercase, and when it looks ugly, figure out something unconventional and transform each ugly metric manually.

not really opposed to camel case but mixing lower/snakecase with camel case makes my eyes cry a bit :)

'lower/snakecase' is not a thing! It is either lowercase mixed with snake_case (which makes my eyes cry), or camelCase mixed with snake_case (which I don't love, but don't mind either).

A reason why we are having this problem is because of the limitation of separators in Prometheus. JMX is using '.'s, but Prometheus expects no '.'s, right? If I'm wrong about this, and we can use '.', can we just keep the original JMX name completely, and not use underscores as hierarchy separators at all? (Just checking that this is not an option).

What I am hearing from yall just sounds crazy to me: "We don't like the way snake_case and camelCase look together, and are willing to make everything harder just to get rid of it". (Then again, I'm sure we could find some other issue where I'm on the opposite side and fighting to make things harder for aesthetic reasons :p)

Ahhhh anyway. I think I'm writing all this just to vent. I can tell that I'm losing here. If Luca and Filippo and Eric all want the 'convention' to be 'lowercase and figure out ugliness on your own', then let's proceed and do it.

Ottomata added a comment.EditedOct 9 2017, 5:01 PM

BTW, this will also be relevant for both upcoming Prometheusization for Druid and Hadoop, even in non JMX contexts. If we use Druid's HTTP API for some monitoring, we'd have to do transformation to get rid of camelCase. e.g. segmentsToDropSize, etc.

I've re-read the thread and I think I have a proposal to move things forward.

The reason I've insisted on all jmx_exporter to have the same convention for metric name casing (lowercase ATM, same as Prometheus) is not just aesthetics but to make full use of Prometheus and its metric model. To give a concrete example we will be able to have a "JVM dashboard" to examine details of any JVM in the fleet, because they use all the same metric names and thus the dashboard is generic.

With the above in mind I suggest we keep the same casing convention for jmx_exporter belonging to the same software rather than to all software. IOW cassandra can keep lowercase and kafka can use CamelCase, this isn't in conflict with what I wrote above since jmx_exporter applies casing conventions only to custom metrics, not its builtin ones.

@elukey @Eevans @Ottomata @Gehel how does that sound?

+1, looks good to me. As said above I don't particularly like mixing lowercase with camelcase but I find your solution a optimal compromise for all the suggestions/requirements expressed in this task.

All good for me! We can always revisit later if we find real life issues...

Change 383557 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kafka::broker::monitoring: refactor prometheus metric names

https://gerrit.wikimedia.org/r/383557

Hahahah my grumpiness paid off? Sounds good yall :)

You are teaching me a bad lesson, you just wait until the next naming bikeshed, I'll never stop mwoohahahahha

Change 383557 merged by Elukey:
[operations/puppet@production] profile::kafka::broker::monitoring: refactor prometheus metric names

https://gerrit.wikimedia.org/r/383557

Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.Oct 16 2017, 5:10 PM
elukey closed this task as Resolved.Oct 18 2017, 7:58 AM