Wed, Jun 21
Tue, Jun 20
Thu, Jun 15
I think possibly @Andrew just fixed this but maybe it wouldn't have effected historical artifacts.
Note from irc: these are closer in function to the old dataset boxes rather than existing labstores. They need to go in a public VLAN being externally accessible. :)
Wed, Jun 14
Tue, Jun 13
Post hoc note. I noticed that /etc/libvirt/qemu/networks/autostart/default.xml is ensured absent in our nova compute role. This is a file that libvirt seems to generate and the contents of it stock are:
I was thinking a separate nginx ingress in total, and ignoring tools-proxy-xx here for sure. This is a consideration of Tools the environment being bigger than one implementation inside of it, etc. If you don't want to go with Puppet then it's a non-issue. We aren't ready to be that cool atm. I was thinking more detached than it came across, but if you go far enough adrift what's the point of course :) It's your call. Maybe at some point it will be a PITA either way chosen and changing it up will make sense. I don't feel too strongly about it as it's outlined.
Small thought to which I'm not terribly attached but it is worth mentioning:
DBA gents please let us know when it's ready for views and meta_p fixup and we'll get on it! Thanks
Fri, Jun 9
Thu, Jun 8
Quoting @faidon from irc:
turns out 37b83e8b2c04a58f555ee5627a415561ab792d26 unintentionally resulted in this
This is probably from an operation against this fact:
Commit: d3dc61097073773b308f2cc1bb9352c4aea61be8 Author: Alexandros Kosiaris <firstname.lastname@example.org> Date: (5 hours ago) 2017-06-08 12:12:13 +0300 Subject: puppetmaster: Set stringify_facts = false
elukey@deployment-aqs03:~$ dig -x 10.68.17.125 +short elukey ci-jessie-wikimedia-505374.contintcloud.eqiad.wmflabs. elukey deployment-aqs03.deployment-prep.eqiad.wmflabs. elukey is it normal? :D
It seems not, I'm going to close this but anyone who knows differently please reopen
Wed, Jun 7
If we want to holdoff on the labsdb root inclusion I am going to propose in the opsen meeting this task become:
We did add a second network node to our single openstack deployment with a custom failover procedure, but more verbose versions of this will be tracked elsewhere. Failover is currently documented at https://wikitech.wikimedia.org/wiki/Portal:Wikimedia_VPS/Admin/Troubleshooting#Fail-over
Tue, Jun 6
@Papaul yes, thank you
This task was to make a plan for user mgmt access to bare metal as a service @Dzahn to help clarify, which we have no plans to do.
Mon, Jun 5
@Dzahn I believe this user is in the k8s environment which is handled separately from SGE
For now this is totally off the books
With T85610 also being declined I'm going to say any work towards this end is a ways off and will be tracked in other tasks
Fri, Jun 2
Thu, Jun 1
sounds good to me, let us know when you hit the Tools road block with aptly and one of us can untangle (i.e. this)
Thu, May 25
Thanks for logging this task! We have had a bunch of different scattered instructions, where (what page(s)) were you working from?
May 24 2017
May 21 2017
May 17 2017
Small note just for posterity as I think there is no relation (per volans):
I am suspecting this is some odd DNS issue that happens in a race (that's mostly been your speciality :D) I'm tossing your way to grab attention but will look into this too. I'll try to sync up later today.
I am dropping 2 of the 4 holdovers to give us some headroom:
May 13 2017
May 9 2017
May 8 2017
I'm not sure what the story is other than during debugging it disappeared for now. Pattern as I understand it is: puppet slow seemingly from degraded IO sunday, failed away from the host as primary and rebooted, symptoms still persisted, I did the above poking looking for some IO issue indicator and went back to demonstrate and the issue would no longer reproduce.
Well. Now the issue has gone dormant or fsck addressed it.
I updated /etc/default/rcS:
I tried dropping into LifeCycle controller to run diagnostics via F-10