Page MenuHomePhabricator

Upgrade prometheus-jmx-exporter on all services using it
Open, MediumPublic

Description

prometheus-jmx-exporter 0.3.0 is now available and should be upgraded everywhere. Affected services (obtained by querying puppetdb via cumin):

  • aqs
  • maps
  • restbase
  • analytics
  • conf
  • druid
  • kafka
  • puppetdb

The new package needs to be deployed, and the affected services restarted.

Event Timeline

The restbase cluster has been upgraded package-wise, but a rolling restart still needs to be scheduled.

The restbase cluster has been upgraded package-wise, but a rolling restart still needs to be scheduled.

This could be combined with the rolling restart for the Java security update (given 3.11.2 is deemed ready).

RobH triaged this task as Medium priority.May 3 2018, 4:48 PM
RobH subscribed.

I'm not quite sure if this is a normal or a high priority task. Seems normal, since we aren't requiring immediate updating of all hosts, and it can be accomplished in a more moderate pace.

The puppetdb servers have been upgraded to prometheus-jmx-exporter 0.3.0-1

Pnorman subscribed.

We don't see anything for the maps team to do on this - at the very least, we don't think it needs any resources from us.

Vvjjkkii renamed this task from Upgrade prometheus-jmx-exporter on all services using it to nbeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.

AQS has been partially upgraded:

prometheus-jmx-exporter
aqs1005.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1008.eqiad.wmnet:   Installed: 0.10-3
aqs1006.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1009.eqiad.wmnet:   Installed: 0.10-3
aqs1004.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1007.eqiad.wmnet:   Installed: 1:0.3.0-1

@elukey ?

RESTBase has been upgraded:

prometheus-jmx-exporter
restbase1007.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1010.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1011.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1016.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1008.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1012.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1017.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1013.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1009.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1014.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1015.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1018.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase2003.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2004.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2008.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2011.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2001.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2002.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2007.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2010.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2005.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2006.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2009.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2012.codfw.wmnet:   Installed: 1:0.3.0-1

We don't see anything for the maps team to do on this - at the very least, we don't think it needs any resources from us.

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

I don't believe we'll be creating a new Cassandra cluster.

AQS has been partially upgraded:

prometheus-jmx-exporter
aqs1005.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1008.eqiad.wmnet:   Installed: 0.10-3
aqs1006.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1009.eqiad.wmnet:   Installed: 0.10-3
aqs1004.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1007.eqiad.wmnet:   Installed: 1:0.3.0-1

@elukey ?

I am reimaging the cluster to Stretch so today/tomorrow 0.3.0-1 should be rolled out everywhere :)

For the remaining Analytics nodes, I'd wait for the next round of reboots or jvm upgrades if this task is not super urgent..

Followed the awesome https://debmonitor.wikimedia.org/packages/prometheus-jmx-exporter, and upgraded all the remaining kafka/analytics hosts. I haven't restarted any jvm, will do it during the next round of restarts.

As far as I can see this task can be closed, anything against it?

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

I don't believe we'll be creating a new Cassandra cluster.

Did something change, or did I just misunderstand? I thought the plan was to reshape the existing cluster to free up machines, and create a new cluster to migrate to. @Gehel ?