User Details
- User Since
- May 1 2020, 10:28 PM (177 w, 17 h)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- RKemper (WMF) [ Global Accounts ]
Tue, Sep 19
Mon, Sep 18
Running decom cookbook for wdqs100[3,4]. Dc-ops ticket up here: https://phabricator.wikimedia.org/T346699
Thu, Sep 14
Removed subtask because I think the scap ticket is not directly related to this one.
Wed, Sep 13
The reindex, even though we had to terminate it before it finished, had already gotten to enwiki_content. So this is done. Here's what the value on both eqiad and codfw looks like:
Mon, Sep 11
Fri, Sep 8
Host was reimaged but patch wasn't yet merged. Merging patch and rolling the re-image again.
Thu, Sep 7
Wed, Sep 6
Oops, meant this to be in progress. Changing status to Open from previous state of Resolved.
We made a slight mistake: one of these hosts needed to be in wdqs-internal since wdqs1003 (one of the hosts replaced by these new hosts) is.
Sat, Sep 2
Fri, Sep 1
Thu, Aug 31
The work for this ticket is done, but I made the actual decom ticket a subtask so I'll leave this in blocked/waiting until that's done
Generated new cergen certs for wdqs.discovery.wmnet that include wdqs1016 in the alt_names instead of wdqs1005. Followed the steps below:
Wed, Aug 30
Tue, Aug 29
Mon, Aug 28
Aug 17 2023
Some observations from last two patches, tested on wdqs2007 before reverting due to issues:
Aug 16 2023
Built wmf-elasticsearch-search-plugins_7.10.2-9 and wmf-elasticsearch-search-plugins_7.10.2-9~bullseye (https://apt.wikimedia.org/wikimedia/pool/thirdparty/elastic710/w/wmf-elasticsearch-search-plugins/); installed on all elastic* hosts (incl. relforge* and cloudelastic*). Rolling restarts not completed yet. relforge* can be restarted at any time, but elastic* and cloudelastic* must wait till after an ongoing reindex of all wikis has completed.
Will be in blocked/waiting for a few days while a reindex of all wikis completes to apply the newest settings.
Aug 15 2023
Aug 14 2023
Patch was merged here: https://gerrit.wikimedia.org/r/947928
Aug 8 2023
Looks like we lost track of this a bit. @bking and I can work this this week.
Aug 7 2023
Just some investigation we did to understand where the metrics come from: probe_ssl_earliest_cert_expiry comes from the blackbox exporter (see random docs). That metric is used by the alerting repo here: https://github.com/wikimedia/operations-alerts/blob/4ecc222e95710395a6f9a7039e487186d2264323/team-sre/probes.yaml#L55
Aug 3 2023
Checked like so:
Aug 1 2023
Jul 28 2023
Jul 25 2023
Decom cookbook finished, and dc-ops ticket created (see ticket desc AC section for ticket #)
Jul 24 2023
Jul 21 2023
wdqs202[1-2] have been brought into service. With teh merging of https://gerrit.wikimedia.org/r/c/operations/puppet/+/940272, all hosts are now in service and have alerting enabled.
With the new hosts in service, we can now begin decom'ing these hosts at our convenience.
Jul 20 2023
All of these hosts except wdqs202[1-2] are in service. Those last two hosts will be brought in service after a final data xfer (ongoing).
Jul 18 2023
First draft of this ticket up. There's a couple things that aren't perfect:
Jul 17 2023
Jul 13 2023
Jun 29 2023
Merged patch (had wrong ticket in commit message): https://gerrit.wikimedia.org/r/c/operations/puppet/+/934403
Jun 27 2023
This should be done, but I haven't yet ran a validation command to sanity check that the correct version is in place.
Jun 26 2023
May 30 2023
Should be deployed as of today.
May 22 2023
Thanks for the patience on this! This is getting deployed today.
Documentation aspect of this ticket's already done. Basically two things left to do to close this ticket out: