As part of porting redis metrics to Prometheus I wanted to test the metrics in deployment-prep. The redis instances running are trusty, so I've provisioned two new redis instances with stretch (deployment-redis0[56]). We'll need to make them in active use and remove the old trusty instances.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | MoritzMuehlenhoff | T143536 Upgrade all mw* servers to debian jessie | |||
Resolved | None | T144006 Move the MW Beta appservers to Debian | |||
Resolved | EddieGP | T132259 Deployment-prep hosts with puppet errors (tracking) | |||
Resolved | fgiunchedi | T179371 Move deployment-prep redis instances to stretch |
Event Timeline
AFAIU this is the procedure to commission new redis instances:
- add 05 and 06 to redis::shards in https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep (just one redis instance each)
- set 05 as slave of 01 (ditto for 06 -> 02)
- change wmf-config/LabsServices.php to point to 05 and 06
- change wmf-config/jobqueue-labs.php to point to 05
- change hieradata/labs/deployment-prep/common.yaml jobrunner config to point to 05
- verify all redis traffic is hitting 05/06 and not 01/02
- break replication, 05/06 are now masters
- decom 01/02
Change 387570 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] labs: use new redis servers for locks
Change 386869 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: add redis stretch deployment-prep instances
Change 387579 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: use deployment-redis05 for labs jobrunner
deployment-redis01 and deployment-redis02 have puppet failure due to the prometheus redis_exporter requiring systemd:
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: {"message":"Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, You can only use systemd resources on systems with systemd, got upstart at /etc/puppet/modules/systemd/manifests/init.pp:8:9 at /etc/puppet/modules/prometheus/manifests/redis_exporter.pp:49 on node deployment-redis01.deployment-prep.eqiad.wmflabs","issue_kind":"RUNTIME_ERROR","stacktrace":["Warning: The 'stacktrace' property is deprecated and will be removed in a future version of Puppet. For security reasons, stacktraces are not returned with Puppet HTTP Error responses."]}
So IIRC @Pchelolo has finished running their tests in deployment-prep that used redis. So we could actually move forward with the above patches and move redis to stretch in deployment-prep, sounds good Release-Engineering-Team ?
@fgiunchedi sounds good to me! Puppet is now broken on the old redis nodes, as @hashar mentioned above.
Indeed, I'm removing myself as assignee since I won't have time to work on moving over to the new redist stretch instances in deployment-prep.
For reference these are the related reviews:
These are two out of four remaining trusty instances in deployment-prep, and continuously failing puppet for months. I wonder whether they still serve any purpose or should just be deleted - and if they are meant to stay, who would be responsible for upgrading them/resolving the puppet errors.
@fgiunchedi: So we basically need to find someone to review the puppet patches (or cherry-pick them) and merge the mediawiki-config patch, then shut down the old instances and remove references to them? If so I can probably push this across the finish line.
It looks like @Joe has made and merged patches that essentially obsolete those, and I can't find any remaining references to the old instances. I'm going to shut down the old instances and see if anything breaks.
Mentioned in SAL (#wikimedia-releng) [2018-06-09T02:13:30Z] <Krenair> shut down old deployment-redis01 and deployment-redis02 instances T179371
Change 387570 abandoned by Filippo Giunchedi:
labs: use new redis servers for locks
Reason:
As per Alex "Obsoleted by Ia65009dc"
Change 386869 abandoned by Filippo Giunchedi:
hieradata: add redis stretch deployment-prep instances
Reason:
Indeed, as per Alex "Obsoleted by I411fcef3"
Change 387579 abandoned by Filippo Giunchedi:
hieradata: use deployment-redis05 for labs jobrunner
Reason:
As per Alex "Obsoleted by I411fcef3"
Change 386869 restored by Krinkle:
hieradata: add redis stretch deployment-prep instances
Reason:
Still beta-picked
Change 386869 abandoned by Krinkle:
hieradata: add redis stretch deployment-prep instances
Mentioned in SAL (#wikimedia-releng) [2018-07-08T16:54:10Z] <Krenair> deleted deployment-redis02 T179371
Mentioned in SAL (#wikimedia-releng) [2018-07-08T16:54:59Z] <Krenair> deleted deployment-redis01 T179371