I had a conversation with @hashar about this topic. So here are a few idea:
Data transfer completed with the new cookbook, everything seems fine.
Tue, Apr 16
Stretch migration is completed. This should be fixed, we'll reopen if this happens again.
redundant units have been cleaned via cumin:
Mon, Apr 15
Deployment seems to be a noop:
permissions reset via:
Removing maps from this ticket, since there isn't any work left on our side.
Fri, Apr 12
Thu, Apr 11
I don't think there is anything actionable at this point. Let's close.
Wed, Apr 10
Open firewall on cloudelsatic machines to allow connections from mwmaint*, mw job runners and cloudelastic
Tue, Apr 9
Reimage was problematic, with first a puppet failure and then the server not booting over PXE. Manually booting in PXE (F12) finally fixed the issue.
Mon, Apr 8
Fri, Apr 5
Thu, Apr 4
ehel@elastic2048:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sda1(F) sdb1 29279232 blocks super 1.2 [2/1] [_U]
Wed, Apr 3
Node is depooled and excluded from the cluster. @Papaul if you have a spare, feel free to do what needs doing. Ping me when done and I'll reimage.
Tue, Apr 2
Mon, Apr 1
Fri, Mar 29
Thu, Mar 28
Wed, Mar 27
Mon, Mar 25
Fri, Mar 22
The elasticsearch security manager is preventing log4j2 to auto-reload it's configuration (more precisely, it can't restart the GELF appender, as socket access is denied). So we will require a full cluster restart to reload the logging configuration. This will be done next week, bundled with the JVM upgrade.
disabling this logger for now, let's not forget to re-enable it once we've fixed the underlying issues!
Note that we should take this as an opportunity to fix T216235 as well.
Thu, Mar 21
Archived settings were reset. For reference, the settings before the reset:
Wed, Mar 20
Mar 19 2019
actually, we're deploying a new unit as a template, so I'm not sure if we can just override the standard one. This will need discussion with someone who understand systemd better than I do.
Mar 18 2019
Mar 14 2019
Mar 13 2019
Mar 12 2019
The above patch will allow prometheus to collect the metrics after the domain was changed. We still need to update the dashboards.