Page MenuHomePhabricator

Upgrade prometheus-jmx-exporter on all services using it
Open, NormalPublic

Description

prometheus-jmx-exporter 0.3.0 is now available and should be upgraded everywhere. Affected services (obtained by querying puppetdb via cumin):

  • aqs
  • maps
  • restbase
  • analytics
  • conf
  • druid
  • kafka
  • puppetdb

The new package needs to be deployed, and the affected services restarted.

Event Timeline

Gehel created this task.Apr 24 2018, 6:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2018, 6:10 PM
fdans moved this task from Incoming to Radar on the Analytics board.Apr 26 2018, 4:27 PM
Eevans added a subscriber: Eevans.Apr 26 2018, 5:32 PM

The restbase cluster has been upgraded package-wise, but a rolling restart still needs to be scheduled.

The restbase cluster has been upgraded package-wise, but a rolling restart still needs to be scheduled.

This could be combined with the rolling restart for the Java security update (given 3.11.2 is deemed ready).

RobH triaged this task as Normal priority.May 3 2018, 4:48 PM
RobH added a subscriber: RobH.

I'm not quite sure if this is a normal or a high priority task. Seems normal, since we aren't requiring immediate updating of all hosts, and it can be accomplished in a more moderate pace.

herron added a subscriber: herron.May 7 2018, 2:14 PM

The puppetdb servers have been upgraded to prometheus-jmx-exporter 0.3.0-1

herron updated the task description. (Show Details)May 7 2018, 2:14 PM
Pnorman added a subscriber: Pnorman.

We don't see anything for the maps team to do on this - at the very least, we don't think it needs any resources from us.

Vvjjkkii renamed this task from Upgrade prometheus-jmx-exporter on all services using it to nbeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot triaged this task as Normal priority.Jul 3 2018, 3:28 AM
Eevans added a subscriber: elukey.Jul 3 2018, 10:05 PM

AQS has been partially upgraded:

prometheus-jmx-exporter
aqs1005.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1008.eqiad.wmnet:   Installed: 0.10-3
aqs1006.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1009.eqiad.wmnet:   Installed: 0.10-3
aqs1004.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1007.eqiad.wmnet:   Installed: 1:0.3.0-1

@elukey ?

RESTBase has been upgraded:

prometheus-jmx-exporter
restbase1007.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1010.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1011.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1016.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1008.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1012.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1017.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1013.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1009.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1014.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1015.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase1018.eqiad.wmnet:   Installed: 1:0.3.0-1
restbase2003.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2004.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2008.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2011.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2001.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2002.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2007.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2010.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2005.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2006.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2009.codfw.wmnet:   Installed: 1:0.3.0-1
restbase2012.codfw.wmnet:   Installed: 1:0.3.0-1
Eevans updated the task description. (Show Details)Jul 3 2018, 10:06 PM
Eevans removed a project: RESTBase-Cassandra.

We don't see anything for the maps team to do on this - at the very least, we don't think it needs any resources from us.

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

I don't believe we'll be creating a new Cassandra cluster.

elukey added a comment.Jul 4 2018, 6:04 AM

AQS has been partially upgraded:

prometheus-jmx-exporter
aqs1005.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1008.eqiad.wmnet:   Installed: 0.10-3
aqs1006.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1009.eqiad.wmnet:   Installed: 0.10-3
aqs1004.eqiad.wmnet:   Installed: 1:0.3.0-1
aqs1007.eqiad.wmnet:   Installed: 1:0.3.0-1

@elukey ?

I am reimaging the cluster to Stretch so today/tomorrow 0.3.0-1 should be rolled out everywhere :)

elukey added a comment.Jul 4 2018, 7:20 AM

For the remaining Analytics nodes, I'd wait for the next round of reboots or jvm upgrades if this task is not super urgent..

elukey updated the task description. (Show Details)Jul 5 2018, 3:20 PM
elukey added a comment.Jul 5 2018, 3:30 PM

Followed the awesome https://debmonitor.wikimedia.org/packages/prometheus-jmx-exporter, and upgraded all the remaining kafka/analytics hosts. I haven't restarted any jvm, will do it during the next round of restarts.

As far as I can see this task can be closed, anything against it?

Eevans added a comment.Jul 5 2018, 7:04 PM

@Pnorman TTBMK, the idea was to ensure that the new Cassandra cluster was setup to use the exporter instead of cassandra-metrics-collector (Graphite).

I don't believe we'll be creating a new Cassandra cluster.

Did something change, or did I just misunderstand? I thought the plan was to reshape the existing cluster to free up machines, and create a new cluster to migrate to. @Gehel ?

elukey moved this task from Backlog to Done on the User-Elukey board.Jul 9 2018, 9:43 AM