Fri, Sep 13
I have started the documentation which can be found here: https://wikitech.wikimedia.org/wiki/Etcd
Wed, Sep 11
Documentation at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Worker_nodes completed as per request. Marking as resolved.
Fri, Sep 6
As a workaround, I'm going to use /usr/bin/timeout utility to wrap the command.
$ sudo grep RuntimeMaxSec /var/log/daemon.log Sep 4 16:29:27 tools-k8s-master-01 systemd: [/lib/systemd/system/maintain-kubeusers-timer.service:7] Unknown lvalue 'RuntimeMaxSec' in section 'Service'
Thu, Aug 29
I have updated the docs located at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Worker_nodes to include the command to update the prometheus-node-exporter package after the build.
Wed, Aug 28
Tue, Aug 27
Mon, Aug 26
Does it make more sense to close this ticket as the original issue has been resolved? We then create a new ticket to prevent this issue from re-occurring?
Aug 15 2019
Blocked due to https://phabricator.wikimedia.org/T212855
Aug 14 2019
I managed to bypass that issue by running
Aug 13 2019
I created a new instance "toolsbeta-test-puppet-sandbox" with jessie image and it looks like it came with prometheus-node-exporter version 0.14.0 not 0.17.0. As per Arturo's suggestion, I am looking into create a Puppet patch for this issue.
Aug 12 2019
The metrics are now exposed
During the prometheus-node-exporter.service startup, the following error occurs
In horizon, "Instance Console Log", I can see the following logs for tools-worker-1030
Aug 9 2019
Aug 8 2019
- uid=hpham,ou=people,dc=wikimedia,dc=org has no Cloud VPS memberships
- Created by the OIT group on the first day.. I think.. according to them.. it's for "your Gmail, SF Office WiFi, VPN, and Fileserver".. if this is the case is it a good idea to block it?
If I can only keep one, I prefer the first one
Aug 6 2019
My SSH public key: