Move deployment-prep redis instances to stretch
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	fgiunchedi
	Oct 31 2017, 11:09 AM

Description

As part of porting redis metrics to Prometheus I wanted to test the metrics in deployment-prep. The redis instances running are trusty, so I've provisioned two new redis instances with stretch (deployment-redis0[56]). We'll need to make them in active use and remove the old trusty instances.

Details

Subject	Repo	Branch	Lines +/-
hieradata: add redis stretch deployment-prep instances	operations/puppet	production	+6 -0
hieradata: use deployment-redis05 for labs jobrunner	operations/puppet	production	+2 -2
labs: use new redis servers for locks	operations/mediawiki-config	master	+2 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	MoritzMuehlenhoff	T143536 Upgrade all mw* servers to debian jessie
Resolved	None	T144006 Move the MW Beta appservers to Debian
Resolved	EddieGP	T132259 Deployment-prep hosts with puppet errors (tracking)
Resolved	fgiunchedi	T179371 Move deployment-prep redis instances to stretch

Event Timeline

fgiunchedi created this task.Oct 31 2017, 11:09 AM

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptOct 31 2017, 11:09 AM

AFAIU this is the procedure to commission new redis instances:

add 05 and 06 to redis::shards in https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep (just one redis instance each)
set 05 as slave of 01 (ditto for 06 -> 02)
change wmf-config/LabsServices.php to point to 05 and 06
change wmf-config/jobqueue-labs.php to point to 05
change hieradata/labs/deployment-prep/common.yaml jobrunner config to point to 05
verify all redis traffic is hitting 05/06 and not 01/02
break replication, 05/06 are now masters
decom 01/02

cc @Joe @elukey @hashar

Change 387570 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] labs: use new redis servers for locks

https://gerrit.wikimedia.org/r/387570

gerritbot added a project: Patch-For-Review.Oct 31 2017, 1:47 PM

Change 386869 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: add redis stretch deployment-prep instances

https://gerrit.wikimedia.org/r/386869

Change 387579 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: use deployment-redis05 for labs jobrunner

https://gerrit.wikimedia.org/r/387579

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Nov 6 2017, 2:29 PM

fgiunchedi removed a parent task: T148637: Port redis statistics to Prometheus.Dec 21 2017, 10:53 AM

deployment-redis01 and deployment-redis02 have puppet failure due to the prometheus redis_exporter requiring systemd:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: {"message":"Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, You can only use systemd resources on systems with systemd, got upstart at /etc/puppet/modules/systemd/manifests/init.pp:8:9 at /etc/puppet/modules/prometheus/manifests/redis_exporter.pp:49 on node deployment-redis01.deployment-prep.eqiad.wmflabs","issue_kind":"RUNTIME_ERROR","stacktrace":["Warning: The 'stacktrace' property is deprecated and will be removed in a future version of Puppet. For security reasons, stacktraces are not returned with Puppet HTTP Error responses."]}

So IIRC @Pchelolo has finished running their tests in deployment-prep that used redis. So we could actually move forward with the above patches and move redis to stretch in deployment-prep, sounds good Release-Engineering-Team ?

hashar merged a task: T184243: Puppet broken on deployment-redis0[12] due to systemd on trusty.Jan 6 2018, 9:19 AM

hashar added a project: Beta-Cluster-Infrastructure.

hashar mentioned this in T184243: Puppet broken on deployment-redis0[12] due to systemd on trusty.

hashar added a parent task: T132259: Deployment-prep hosts with puppet errors (tracking).

hashar added a subscriber: Krenair.

@fgiunchedi sounds good to me! Puppet is now broken on the old redis nodes, as @hashar mentioned above.

fgiunchedi moved this task from Doing to Backlog on the User-fgiunchedi board.Feb 5 2018, 3:17 PM

EddieGP moved this task from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.Mar 31 2018, 12:47 PM

EddieGP merged a task: T191163: Puppet broken on deployment-redis0[12].Apr 4 2018, 10:23 PM

EddieGP added a parent task: T144006: Move the MW Beta appservers to Debian.

EddieGP added subscribers: MarcoAurelio, EddieGP, Paladox.

EddieGP mentioned this in T144006: Move the MW Beta appservers to Debian.Apr 4 2018, 10:29 PM

Indeed, I'm removing myself as assignee since I won't have time to work on moving over to the new redist stretch instances in deployment-prep.
For reference these are the related reviews:

fgiunchedi moved this task from Backlog to Radar on the User-fgiunchedi board.Apr 6 2018, 8:06 AM

These are two out of four remaining trusty instances in deployment-prep, and continuously failing puppet for months. I wonder whether they still serve any purpose or should just be deleted - and if they are meant to stay, who would be responsible for upgrading them/resolving the puppet errors.

Krenair mentioned this in T195686: Move puppetmaster to Stretch.May 27 2018, 7:53 PM

@fgiunchedi: So we basically need to find someone to review the puppet patches (or cherry-pick them) and merge the mediawiki-config patch, then shut down the old instances and remove references to them? If so I can probably push this across the finish line.

It looks like @Joe has made and merged patches that essentially obsolete those, and I can't find any remaining references to the old instances. I'm going to shut down the old instances and see if anything breaks.

Mentioned in SAL (#wikimedia-releng) [2018-06-09T02:13:30Z] <Krenair> shut down old deployment-redis01 and deployment-redis02 instances T179371

hashar unsubscribed.Jun 9 2018, 5:48 AM

Change 387570 abandoned by Filippo Giunchedi:
labs: use new redis servers for locks

Reason:
As per Alex "Obsoleted by Ia65009dc"

https://gerrit.wikimedia.org/r/387570

Change 386869 abandoned by Filippo Giunchedi:
hieradata: add redis stretch deployment-prep instances

Reason:
Indeed, as per Alex "Obsoleted by I411fcef3"

https://gerrit.wikimedia.org/r/386869

Change 387579 abandoned by Filippo Giunchedi:
hieradata: use deployment-redis05 for labs jobrunner

Reason:
As per Alex "Obsoleted by I411fcef3"

https://gerrit.wikimedia.org/r/387579

Alright. Leaving open pending deletion of the old redis hosts in a few weeks then?

Change 386869 restored by Krinkle:
hieradata: add redis stretch deployment-prep instances

Reason:
Still beta-picked

https://gerrit.wikimedia.org/r/386869

Change 386869 abandoned by Krinkle:
hieradata: add redis stretch deployment-prep instances

https://gerrit.wikimedia.org/r/386869

Mentioned in SAL (#wikimedia-releng) [2018-07-08T16:54:10Z] <Krenair> deleted deployment-redis02 T179371

Mentioned in SAL (#wikimedia-releng) [2018-07-08T16:54:59Z] <Krenair> deleted deployment-redis01 T179371

Krenair closed this task as Resolved.Jul 13 2018, 8:30 PM

Krenair assigned this task to fgiunchedi.

Move deployment-prep redis instances to stretchClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move deployment-prep redis instances to stretch
Closed, ResolvedPublic
Actions

Related Objects
Search...